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COMPOSITIONS AND METHODS FOR INHIBITING GENE EXPRESSION 

5 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims the priority benefit of U.S. Patent Application 
10 09/545,574, filed April 7, 2000, pending, which is hereby incorporated herein by 

reference in its entirety. 

STATEMENT OF RIGHTS TO INVENTIONS MADE UNDER 
FEDERALLY SPONSORED RESEARCH 
1 5 Not applicable. 

TECHNICAL FIELD 
This invention is in the field of genetic analysis. Specifically, the invention 
relates to the generation of a eukaryotic vector that allows bi-directional 
20 transcription of a transgene to yield both sense and antisense RNA transcripts from 

the same transgene. The compositions and methods embodied in the present 
invention are particularly useful for targeted inhibition of gene expression in a 
eukaryotic cell. 

25 BACKGROUND OF THE INVENTION 

The structure and biological behavior of a cell is determined by the pattern of. 
gene expression within that cell at a given time. Perturbations of gene expression 
have long been acknowledged to account for a vast number of diseases including, 
numerous forms of cancer, vascular diseases, neuronal and endocrine diseases. 

30 Abnormal expression patterns, in form of amplification, deletion, gene 

rearrangements, and loss or gain of function mutations, are now known to lead to 
aberrant behavior of a disease cell. Aberrant gene expression has also been noted as 
a defense mechanism of certain organisms to ward off the threat of pathogens. 
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One of the major challenges of genetic engineering has been to regulate the 
expression of targeted genes that are implicated in a wide diversity of physiological 
responses. While overexpression of an exogenously introduced transgene in a 
eukaryotic cell is relatively straightforward, targeted inhibition of specific genes has 
5 been more difficult to achieve. Traditional approaches for suppressing gene 

expression, including site-directed gene disruption, antisense RNA or co-suppressor 
injection, require complex genetic manipulations or heavy dosages of suppressors 
that often exceeds the toxicity tolerance level of the host cell. 

Recently, a new technique, "double- stranded RNA interference" has 

10 emerged in the study of gene silencing. Several research groups have demonstrated 

a marked inhibition of a specific nuclear gene expression in a wide range of 
eukaryotes by introduction into cells of dsRNA fragments that bear sequence 
homology with the nuclear gene. For instance, Fire et al. (1998) Nature 395: 854 
reported the success of gene-specific interference in C. elegans that was mediated by 

1 5 ingested E. coli carrying a prokaryotic vector capable of producing both sense and 

antisense RNAs of the selected C. elgans genes. Misquitta et al. demonstrated the 
targeted disruption of nautilus gene in Drosophila melanogaster by injecting into 
the Drosophila embryo multiple copies of nautilus dsRNA. See Misquitta et al. 
(1999) PNAS U.S.A. 96:1451-1456. Studies by Ngo et al. (1998) Proc. Natl Acad. 

20 of ScL U.S.A., 96: 145 1-1456 confirmed that dsRNA interference also occurs in 

certain protozoan species. Earlier studies by Cogoni et al. and Hamilton et al. 
suggested that formation of dsRNA play a pivotal role in gene silencing in fungi 
Neurospora crassa and other plants. See Cogoni et al. (1999) Nature 399: 166-169; 
Hamilton et al. (1999) Science 286: 950-952; and Waterhouse et al. (1999) PNAS 

25 U.S.A. 95: 13959-13964. More recent investigations by Wargelius et al. revealed 

that this phenomenon is also conserved in vertebrates such as the zebrafish. 
Wargelius et al. Biochem. Biophys. Res. Commun. 263: 156-161. 

Current techniques for achieving RNA mediated gene silencing include: (a) 
use of prokaryotic vectors capable of transcribing both sense and antisense RNA 

30 (Fire et al. (1998) Nature 395: 854; (b) in vitro transcription of individual strands of 

a selected gene followed by annealing the transcribed sense and antisense RNAs 
(see, e.g. Misquitta et al. (1999) PNAS U.S.A. 96:1451-1456); and possibly (c) 
viruses induced gene silencing (see, e.g. Angell et al. (1997) EMBO Journal 16: 
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3675-3684; Angell et al. (1999) Plant Journal 20: 357-362). However, these 
methods bear a number of intrinsic limitations. First, none of these methods 
employs gene delivery vehicles that are applicable for consistent and persistent 
inhibition of gene expression in a eukaryote. Second, these existing methods do not 
5 necessarily result in production of a substantially homogenous population of 

dsRNAs. Notably, the in vitro preparation of double-stranded RNAs by transcribing 
and annealing sense RNA transcripts to antisense transcripts is time consuming, 
labor intensive, and not amenable for mass production or high-throughput analyses. 
Thus, there remains a considerable need for compositions and methods to 

1 0 effect dsRNA-mediated gene silencing. An ideal reagent would be a self-replicating 

vector that is (a) capable of autonomous replication and expression of a selected 
transgene in a eukaryotic cell; and (b) capable of yielding both sense and antisense 
RNA transcripts from the same transgene, so as to effect production of dsRNA 
transcripts in a eukaryotic host cell. The present invention satisfies these needs and 

1 5 provides related advantages as well. 

SUMMARY OF THE INVENTION 
A principal aspect of the present invention is the design of a eukaryotic 
recombinant vector to effect gene silencing in a eukaryotic cell that is susceptible to 

20 dsRNA-mediated reduction of gene expression. Such a vector allows bi-directional 

transcription of a transgene to yield both sense and antisense RNA transcripts of the 
same transgene in a eukaryotic cell. While not being bound to any one theory, the 
production of dsRNAs induces transcriptional and/or post-transcriptional gene 
silencing in the host cell. Accordingly, the present invention provides a recombinant 

25 vector having the following unique characteristics: it comprises a viral replicon 

having two overlapping transcription units arranged in an opposing orientation and 
flanking a transgene of interest, wherein the two overlapping transcription units 
yield both sense and antisense RNA transcripts from the same transgene fragment in 
a eukaryotic host cell. 

30 In one aspect of this embodiment, each of the overlapping transcription units 

of the vector comprises a promoter and a terminator that are arranged in one of the 
configurations shown in Figure 2(a)-(d). The promoter can be constitutive or 
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inducible; it can be active in all tissues and cell types of an organism or operative 
only in selected tissues (i.e. tissue-specific). 

In another aspect, the recombinant vector comprises a viral replicon that is 
derived from a DNA virus. Such DNA viruses can be selected from the group 
5 consisting of Geminivirus, Caulimoviridae, Badnaviridae, Circoviridae, 

Circinoviridae, Parvoviridae, Papovaviridae, Polyomaviridae, Adenoviridae, 
Herpesviridae, Poxviridae, Iridoviridae, Baculoviridae, Hepadnaviridae, 
Retroviridae, Gyrovirus, Nanovirus, and African Swine Fever virus. 

In yet another aspect, the subject vector is capable of autonomous replication 

10 in a eukaryotic cell. 

In still another aspect, the subject vector is capable of inhibiting expression 
of genes endogenous to a eukaryotic host cell. Non-limiting representative 
eukaryotic cells whose gene expression can be inhibited upon introduction of the 
subject vectors are fungi, yeast cells, plant cells, inset, avian, mammalian or other 

1 5 animal cells. Preferably, the vectors effect a reduced expression of an endogenous 

gene that is substantially homologous to the transgene contained in the overlapping 
transcription units of the vectors. More preferably, delivery of the vectors into a 
suitable host cell results in a phenotypic change of the host cell. In certain preferred 
embodiments, the endogenous gene is native to the host cell. The endogenous gene 

20 can also be heterologous to the host cell. In some embodiments, the endogenous 

gene is a pathogenic gene derived from one or more members of the group 
consisting of virus, bacterium, fungus, and protozoa. The transgene carried in the 
vector can be a nucleotide sequence that encodes a membrane protein, a cytosolic 
protein, a secreted protein, a nuclear protein, or a chaperon protein. 

25 The present invention also provides host cells transformed with the invention 

vectors. The present invention further provides a transgenic plant comprising a 
eukaryotic recombinant vector of the present invention. 

Also provided by the present invention is a kit for generating a double- 
stranded RNA transcript in a eukaryotic cell that contains the subject vectors in 

30 suitable packaging. 

Further embodied in the present invention is a method of inhibiting 
expression of an endogenous gene present in a eukaryotic cell. The method 
involves: (a) providing a eukaryotic recombinant vector containing a transgene 
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that is substantially homologous to the endogenous gene; (b) introducing the 
eukaryotic recombinant vector into the eukaryotic cell; and (c) culturing the 
eukaryotic cell of (b) under conditions favorable for expression of both sense and 
antisense RNA transcripts from the transgene that is contained in the transcription 
5 units of the vector, and thereby inhibiting expression of the corresponding 

endogenous gene in the eukaryotic cell. 

Also included in the present invention is a method of identifying a biological 
function(s) of an endogenous gene of interest in a eukaryotic cell by selectively 
inhibiting the expression of the endogenous gene. The method comprises: (a) 

10 providing a eukaryotic recombinant vector containing a transgene that is 

substantially homologous to the endogenous gene; (b) introducing the eukaryotic 
recombinant vector of (a) into the eukaryotic cell; (c) culturing the eukaryotic cell 
of (b) under conditions favorable for expression of both sense and antisense RNA 
transcripts from the transgene contained in the eukaryotic recombinant vector and 

15 thereby inhibiting expression of the endogenous gene in the eukaryotic cell; and (d) 

determining one or more phenotypic changes in the eukaryotic cell that correlate 
with the inhibited expression of the endogenous gene, thereby identifying the 
biological function(s) of the endogenous gene in the eukaryotic cell. In essence, the 
subject methods allow the creation of a transient or more long-term gene-specific 

20 knock-out system for analyzing the biological function of any endogenous gene of 

interest. 

BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 is a schematic representation of the process for production of 
25 dsRNA transcripts by a subject vector containing two overlapping transcription 

units. 

Figure 2 (a)-(d) depict four different configurations of the overlapping 
transcription units of the subject vectors. 

Figure 3 is a schematic representation of an exemplary construct MSVLSB- 

30 6. 

Figure 4 depicts the nucleotide sequence of the vector pMSVLSB-1 (SEQ ID 
NO: 9) described in Examples 1-2. 
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Figure 5 depicts the nucleotide sequence of the vector pMSVLSB-2 (SEQ ID 
NO: 10) described in Examples 1-2. 

Figure 6 depicts the nucleotide sequence of the vector pMSVLSB-3 (SEQ ID 
NO:l 1) described in Examples 1-2. 
5 Figure 7 depicts the nucleotide sequence of the vector pMSVLSB-4 (SEQ ID 

NO: 12) described in Examples 1-2. 

Figure 8 depicts the nucleotide sequence of the vector pMSVLSB-5 (SEQ ID 
NO: 13) described in Examples 1-2. 

Figure 9 depicts the nucleotide sequence of the vector pMSVLSB-6 (SEQ ID 
10 NO: 14) described in Examples 1-2. 

MODES FOR CARRYING OUT THE INVENTION 
Throughout this disclosure, various publications, patents and published 
patent specifications are referenced by an identifying citation. The disclosures of 
15 these publications, patents and published patent specifications are hereby 

incorporated by reference into the present disclosure to more fully describe the state 
of the art to which this invention pertains. 



General Techniques: 

20 The practice of the present invention will employ, unless otherwise 

indicated, conventional techniques of immunology, biochemistry, chemistry, 
molecular biology, microbiology, cell biology, genomics and recombinant DNA, 
which are within the skill of the art. See, e.g., Matthews, PLANT VIROLOGY, 3 rd 
edition (1991); Sambrook, Fritsch and Maniatis, MOLECULAR CLONING: A 

25 LABORATORY MANUAL, 2 nd edition (1989); CURRENT PROTOCOLS IN 

MOLECULAR BIOLOGY (F. M. Ausubel, et al. eds., (1987)); the series 
METHODS IN ENZYMOLOGY (Academic Press, Inc.): PCR 2: A PRACTICAL 
APPROACH (MJ. MacPherson, B.D. Hames and G.R. Taylor eds. (1995)), Harlow 
and Lane, eds. (1988) ANTIBODIES, A LABORATORY MANUAL, and 

30 ANIMAL CELL CULTURE (R.I. Freshney, ed. (1987)). 

As used in the specification and claims, the singular form "a", "an" and "the" 
include plural references unless the context clearly dictates otherwise. For example, 
the term "a cell" includes a plurality of cells, including mixtures thereof. 
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Definitions: 

A "plant cell" refers to the structural and physiological unit of plants, 
consisting of a protoplast and the cell wall. 
5 A "protoplast" is an isolated cell without cell walls, having the potency for 

regeneration into cell culture, tissue or whole plant. 

A "host cell" includes an individual cell or cell culture which can be or has 
been a recipient for vector(s) or for incorporation of nucleic acid molecules and/or 
proteins. Host cells include progeny of a single host cell, and the progeny may not 
10 necessarily be completely identical (in morphology or in genomic of total DNA 

complement) to the original parent cell due to natural, accidental, or deliberate 
mutation. A host cell includes cells transfected in vivo with a polynucleotide(s) of 
this invention. 

The terms "polynucleotide", "nucleotides" and "oligonucleotides" are used 
1 5 interchangeably. They refer to a polymeric form of nucleotides of any length, either 

deoxyribonucleotides or ribonucleotides, or analogs thereof. Polynucleotides may have 
any three-dimensional structure, and may perform any function, known or unknown. 
The following are non-limiting examples of polynucleotides: coding or non-coding 
regions of a gene or gene fragment, loci (locus) defined from linkage analysis, exons, 
20 introns, messenger RNA (mRNA), transfer RNA, ribosomal RNA, ribozymes, cDNA, 

recombinant polynucleotides, branched polynucleotides, plasmids, vectors, isolated 
DNA of any sequence, isolated RNA of any sequence, nucleic acid probes, and 
primers. A polynucleotide may comprise modified nucleotides, such as methylated 
nucleotides and nucleotide analogs. If present, modifications to the nucleotide 
25 structure may be imparted before or after assembly of the polymer. The sequence of 

nucleotides may be interrupted by non-nucleotide components. A polynucleotide may 
be further modified after polymerization, such as by conjugation with a labeling 
component. 

A "gene" refers to a polynucleotide containing at least one open reading 
30 frame that is capable of encoding a particular protein after being transcribed and 

translated. 
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"Genes of a specific developmental origin" refer to genes expressed at 
certain but not all developmental stages. For instance, a gene may be of embryonic 
or adult origin depending on the stage during which the gene is expressed. 

A "disease-associated" or "disease-causing" gene refers to any gene which is 
5 yielding transcription or translation products at an abnormal level or in an abnormal 

form in cells derived from a disease-affected tissues compared with tissues or cells of a 
control. It may be a gene that becomes expressed at an abnormally high level; it may 
be a gene that becomes expressed at an abnormally low level, where the altered 
expression correlates with the occurrence and/or progression of the disease. A disease- 
10 associated gene also refers to gene possessing mutation(s) or genetic variation that is 

directly responsible or is in linkage disequilibrium with gene(s) that is responsible for 
the etiology of a disease. The transcribed or translated products may be known or 
unknown, and may be at normal or abnormal level. 

A gene "database" denotes a set of stored data which represent a collection 
15 of sequences including nucleotide and peptide sequences, which in turn represent a 

collection of biological reference materials. 

As used herein, "expression" refers to the process by which a polynucleotide 
is transcribed into mRNA and/or the process by which the transcribed mRNA (also 
referred to as "transcript") is subsequently being translated into peptides, 
20 polypeptides, or proteins. The transcripts and the encoded polypeptides are 

collectedly referred to as gene product. If the polynucleotide is derived from 
genomic DNA, expression may include splicing of the mRNA in an eukaryotic cell. 

"Differentially expressed", as applied to nucleotide sequence or polypeptide 
sequence in a subject, refers to over-expression or under-expression of that sequence 
25 when compared to that detected in a control. Underexpression also encompasses 

absence of expression of a particular sequence as evidenced by the absence of 
detectable expression in a test subject when compared to a control 

"Differential expression" refers to alterations in the abundance or the 
expression pattern of a gene product. 
30 A "primer" is a short polynucleotide, generally with a free 3' -OH group, that 

binds to a target or "template" potentially present in a sample of interest by 
hybridizing with the target, and thereafter promoting polymerization of a 
polynucleotide complementary to the target. 
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The term "hybridize" as applied to a polynucleotide refers to the ability of 
the polynucleotide to form a complex that is stabilized via hydrogen bonding 
between the bases of the nucleotide residues in a hybridization reaction. The 
hydrogen bonding may occur by Watson-Crick base pairing, Hoogstein binding, or 
5 in any other sequence-specific manner. The complex may comprise two strands 

forming a duplex structure, three or more strands forming a multi-stranded complex, 
a single self-hybridizing strand, or any combination of these. The hybridization 
reaction may constitute a step in a more extensive process, such as the initiation of a 
PCR reaction, or the enzymatic cleavage of a polynucleotide by a ribozyme. 

10 Hybridization can be performed under conditions of different "stringency". 

Relevant conditions include temperature, ionic strength, time of incubation, the 
presence of additional solutes in the reaction mixture such as formamide, and the 
washing procedure. Higher stringency conditions are those conditions, such as 
higher temperature and lower sodium ion concentration, which require higher 

1 5 minimum complementarity between hybridizing elements for a stable hybridization 

complex to form. In general, a low stringency hybridization reaction is carried out at 
about 40 °C in 10 x SSC or a solution of equivalent ionic strength/temperature. A 
moderate stringency hybridization is typically performed at about 50 °C in 6 x SSC, 
and a high stringency hybridization reaction is generally performed at about 60 °C in 

20 1 x SSC. 

When hybridization occurs in an antiparallel configuration between two 
single-stranded polynucleotides, the reaction is called "annealing" and those 
polynucleotides are described as "complementary". A double-stranded 
polynucleotide can be "complementary" or "homologous" to another polynucleotide, 

25 if hybridization can occur between one of the strands of the first polynucleotide and 

the second. "Complementarity" or "homology" (the degree that one polynucleotide 
is complementary with another) is quantifiable in terms of the proportion of bases in 
opposing strands that are expected to form hydrogen bonding with each other, 
according to generally accepted base-pairing rules. 

30 In the context of polynucleotides, a "linear sequence" or a "sequence" is an 

order of nucleotides in a polynucleotide in a 5' to 3' direction in which residues that 
neighbor each other in the sequence are contiguous in the primary structure of the 
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polynucleotide. A "partial sequence" is a linear sequence of part of a polynucleotide 
which is known to comprise additional residues in one or both directions. 

The terms "cytosolic", "nuclear" and "secreted" as applied to cellular 
proteins specify the extracellular and/or subcellular location in which the cellular 
5 protein is mostly localized. Certain proteins are "chaperons", capable of 

translocating back and forth between the cytosol and the nucleus of a cell. 

A "subject" as used herein refers to a biological entity containing expressed 
genetic materials. The biological entity is preferably can be plant, animal, or 
microorganisms including bacteria, viruses, fungi, and protozoa. Tissues, cells and 
10 their progeny of a biological entity obtained in vivo or cultured in vitro are also 

encompassed. 

A "control" is an alternative subject or sample used in an experiment for 
comparison purpose. A control can be "positive" or "negative". For example, 
where the purpose of the experiment is to detect a differentially expressed transcript 

15 or polypeptide in cell or tissue affected by a disease of concern, it is generally 

preferable to use a positive control (a subject or a sample from a subject, exhibiting 
such differential expression and syndromes characteristic of that disease), and a 
negative control (a subject or a sample from a subject lacking the differential 
expression and clinical syndrome of that disease). 

20 "Heterologous" means derived from a genotypically distinct entity from the 

rest of the entity to which it is being compared. For example, a promoter removed 
from its native coding sequence and operatively linked to a coding sequence other 
than the native sequence is a heterologous promoter. 

A "cell line" or "cell culture" denotes bacterial, plant, insect or higher 

25 eukaryotic cells grown or maintained in vitro. The descendants of a cell may not be 

completely identical (either morphologically, genotypically, or phenotypically) to 
the parent cell. 

A "vector" is a nucleic acid molecule, preferably self-replicating, which 
transfers an inserted nucleic acid molecule into and/or between host cells. The term 
30 includes vectors that function primarily for insertion of a DNA or RNA into a cell, 

replication of vectors that function primarily for the replication of DNA or RNA, 
and expression vectors that function for transcription and/or translation of the DNA 
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or RNA. Also included are vectors that provide more than one of the above 
functions. 

An "expression vector" is a polynucleotide which, when introduced into an 
appropriate host cell, can be transcribed and translated into a polypeptide(s). An 
5 "expression system" usually connotes a suitable host cell comprised of an expression 

vector that can function to yield a desired expression product. 

A "replicon" refers to a polynucleotide comprising an origin of replication 
(generally referred to as an on sequence) which allows for replication of the 
polynucleotide in an appropriate host cell. Examples of replicons include episomes 
1 0 (such as plasmids), as well as chromosomes (such as the nuclear or mitochondrial 

chromosomes). 

A "transcription unit" is a DNA segment capable of directing transcription of 
a gene or fragment thereof. Typically, a transcription unit comprises a promoter 
operably linked to a gene or a DNA fragment that is to be transcribed, and optionally 
15 regulatory sequences located either upstream or downstream of the initiation site or 

the termination site of the transcribed gene or fragment. 

Vectors of the present invention 

A central aspect of the present invention is the design of a recombinant 
20 vector suited for bi-directional transcription of a transgene to yield both sense and 

antisense RNA transcripts of the transgene in a eukaryotic cell. The invention 
vectors are particularly suited for mediating nuclear gene silencing in a variety of 
biological systems. Distinguished from the previously described DNA vectors, the 
subject vectors have the following unique characteristics: (a) the vector replicates 
25 and directs expression of a transgene in a eukaryotic cell; and (b) the vector 

comprises a replicon having two overlapping transcription units arranged in an 
opposing orientation and flanking a transgene of interest, wherein the two 
overlapping transcription units yield both sense and antisense RNA transcripts from 
the same transgene in a eukaryotic host cell. 
30 Several factors apply to the design of vectors having the above-mentioned 

characteristics. First, the vector comprises a replicon having an origin of replication 
(generally referred to as an on sequence) which permits replication of the vector in a 
eukaryotic host cell. A preferred replicon is one comprising viral sequences capable 
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of directing autonomous replication of the vector in an appropriate host cell. Non- 
limiting examples of viral replicons include sequences derived from DNA viruses 
such as Geminivirus, Caulimoviridae, Badnaviridae; Circoviridae, Circinoviridae, 
Parvoviridae, Papovaviridae, Polyomaviridae, Adenoviridae, Herpesviridae, 
5 Poxviridae, Iridoviridae, Baculoviridae, Hepadnaviridae, Retroviridae, Gyrovirus, 

Nanovirus, and African Swine Fever virus, or the like. In addition to the replication 
origin, a replicon typically carries a transcription unit that directs transcription of a 
transgene or a fragment thereof to yield a plurality of RNA transcripts. 

A second consideration in designing the subject vector is to select two 

1 0 overlapping transcription units. By "overlapping" is meant that the two transcription 

units directs transcription of both DNA strands of the same transgene to yield a 
plurality of partially or perfectly double stranded RNA transcripts. The two 
overlapping transcription units are typically arranged in an opposing orientation so 
that each unit can drive transcription of one of the complementary strands from the 

1 5 same transgene, and thus facilitate the generation of double stranded RNA 

transcripts. Elements within a transcription unit include but are not limited to 
promoter regions, enhancer regions, repressor binding regions, transcription initiation 
sites, ribosome binding sites, translation initiation sites, protein encoding regions and 
introns, and termination sites for transcription and translation. Preferred transcription 

20 units are arranged in a configuration shown in Figure 2(a)-(d). 

As used herein, a "promoter" is a DNA region capable under certain 
conditions of binding RNA polymerase and initiating transcription of a coding 
region located downstream (in the 3' direction) from the promoter. It can be 
constitutive or inducible. In general, the promoter sequence is bounded at its 3' 

25 terminus by the transcription initiation site and extends upstream (5' direction) to 

include the minimum number of bases or elements necessary to initiate transcription 
at levels detectable above background. Within the promoter sequence is a 
transcription initiation site, as well as protein binding domains responsible for the 
binding of RNA polymerase. Eukaryotic promoters will often, but not always, 

30 contain "TATA" boxes and "CAT" boxes. 

The choice of promoters will largely depend on the host cells in which the 
vector is introduced. Commonly employed plant promoters include but are not 
limited those from agrobacterium, nopaline synthase gene, octopine synthase gene, 
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rnannopine synthase, rbcS (small subunit of ribulose bis-phosphate carboxylase). In 
addition, the promoter sequences may be provided by viral material. Any RNA 
virus subgenomic promoters described in Dawson et al. Advances in Virus 
Research , 38:307-342 and WO93/03161 can thus be employed. For animal cells, a 
5 variety of robust promoters, both viral and non- viral promoters, are known in the art. 

Non-limiting representative viral promoters include CMV, the early and late 
promoters of SV40 virus, promoters of various types of adenoviruses (e.g. 
adenovirus 2) and adeno-associated viruses. It is also possible, and often desirable, 
to utilize promoters normally associated with a desired transgene sequence, provided 

10 that such control sequences are compatible with the host cell system. See Goeddel 

et al., Gene Expression Technology Methods in Enzymology Volume 185, 
Academic Press, San Diego, (1991), Ausubel et al, Protocols in Molecular Biology, 
Wiley Interscience (1994). 

Suitable promoter sequences for other eukaryotic cells such as yeast cells 

1 5 include the promoters for 3-phosphoglycerate kinase, or other glycolytic enzymes, 

such as enolase, glyceraldehyde-3 -phosphate dehydrogenase, hexokinase, pyruvate 
decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, 3- 
phosphoglycerate mutase, pyruvate kinase, triosephosphate isomerase, 
phosphoglucose isomerase, and glucokinase. Other promoters, which have the 

20 additional advantage of transcription controlled by growth conditions, are the 

promoter regions for alcohol dehydrogenase 2, isocytochrome C, acid phosphatase, 
degradative enzymes associated with nitrogen metabolism, and the aforementioned 
glyceraldehyde-3 -phosphate dehydrogenase, and enzymes responsible for maltose 
and galactose utilization. 

25 To optimize the yield of double-stranded RNAs formed from the sense and 

anti-sense strands transcribed by the overlapping units, it is preferable to use two 
promoters of comparable strength. The relative strength of the promoters can be 
determined or ascertained by any convention recombinant techniques and methods 
exemplified herein. Representative techniques are Northern blot hybridization and 

30 DNA array-based technologies. An illustrative promoter pair comprises MS V mp 

promoter and CaMV 35S RNA promoter. 

Where desired, heterologous promoters that are removed from their native 
coding sequences and operatively linked to a transgene which it is not naturally 
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found linked, can be used in constructing the invention vectors. As such, any viral 
promoters described above can be used to drive the transcription of a non-viral 
transgenes; promoters of one class of genes can be employed to direct transcription 
of transgenes coding for other related or unrelated classes of proteins. In certain 
5 embodiments of the invention, it is preferable to employ inducible promoters to 

control the transcription of a transgene. A diverse variety of inducible promoters 
have been described in the art. Promoters of any endogenous genes whose 
expressions are inducible by internal or external factors can be employed. Factors 
applicable for transcription induction include but are not limited to hormones, heat 

10 shock, oxygen deficiency, light, stress and various chemicals. Commonly employed 

inducible promoters are p-gal promoter that is activated upon addition of IPTG; 
hps70 promoter that is inducible by heat shock; and ribulose-l,5-biphosphate 
carboxylase (RUBISCO) promoter that is regulated by light. 

Tissue-specific promoters may also be used. A vast diversity of tissue 

15 specific promoters have been described and employed by artisans in the field. 

Representative plant tissue promoters include that of legumin (or other seed storage 
protein promoters), patatin and the like. Exemplary promoters operative in selective 
animal tissue include hepatocyte-specific promoters and cardiac muscle specific 
promoters. Depending on the intended use of the subject vectors, those skilled in the 

20 art will know of other suitable tissue-specific promoters applicable for non- 

constitutive bi-directional transcription. 

In constructing the subject vectors, the termination sequences associated with 
the transgene are also inserted into the 3 1 end of the sequence desired to be 
transcribed to provide polyadenylation of the mRNA and/or transcriptional 

25 termination signal. The terminator sequence preferably contains one or more 

transcriptional termination sequences (such as polyadenylation sequences) and may 
also be lengthened by the inclusion of additional DNA sequence so as to further 
disrupt transcriptional read-through. Preferred terminator sequences (or termination 
sites) of the present invention have a gene that is followed by a transcription 

30 termination sequence, either its own termination sequence or a heterologous 

termination sequence. Examples of such termination sequences, including stop 
codons coupled to various polyadenylation sequences that are known in the art, 
widely available, and exemplified below. Where the terminator comprises a gene, it 
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can be advantageous to use a gene which encodes a detectable or selectable marker; 
thereby providing a means by which the presence and/or absence of the terminator 
sequence (and therefore the corresponding inactivation and/or activation of the 
transcription unit) can be detected and/or selected. Alternatively, a terminator may 
5 simply be a second promoter, arranged in inverted orientation to the promoter 

described above. 

The terminators and promoters of the two overlapping transcription units 
may take a variety of configurations. In one aspect, terminators 1 and 2 of the 
overlapping transcription units are arranged to immediately flank the transgene as 

10 shown in Figure 2(a). In another aspect, the two terminators are placed at the 5' end 

or the 3' end of their respective promoters as depicted in Figure 2(b). In other 
aspects, terminator 1 and promoter 1 are flanked by terminator 2 and promoter 2 as 
shown in Figure 2(c), or vice versa (see Figure 2(d)). Any other variations in 
configuring the two overlapping transcription units that permit bi-directional 

1 5 transcription are encompassed by the present invention. 

The transgene transcribed by an invention vector can be any gene expressed 
in a eukaryotic cell. The selection of transgene is determined largely by the intended 
purpose of the vector. Where the vector is used to inhibit expression of an 
endogenous gene present in a host cell, the transgene selected are substantially 

20 homologous to the target endogenous gene. In general, substantially homologous 

nucleotide sequences are at least about 60% identical with each other, after 
alignment of the homologous regions. Preferably, the sequences are at least about 
75% identical; more preferably, they are at least about 80% identical; more 
preferably, they are at least about 90% identical; still more preferably, the sequences 

25 are 95% identical. 

Sequence alignment and homology searches are often determined with the 
aid of computer methods. A variety of software programs are available in the art. 
Non-limiting examples of these programs are Blast 

(http://www.ncbi.nlm.nih.gov/BLAST/), Fasta (Genetics Computing Group 
30 package, Madison, Wisconsin), DNA Star, Meg Align, and GeneJocky. Any 

sequence databases that contains DNA sequences corresponding to a target gene or a 
segment thereof can be used for sequence analysis. Commonly employed databases 
include but are not limited to GenBank, EMBL, DDBJ, PDB, SWISS-PROT, EST, 
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STS, GSS, and HTGS. Sequence similarity can be discerned by aligning the 
transgene sequence against a target endogenous gene sequence. Common 
parameters for determining the extent of homology set forth by one or more of the 
aforementioned alignment programs include p value and percent sequence identity, 
5 P value is the probability that the alignment is produced by chance. For a single 

alignment, the p value can be calculated according to Karlin et al. (1990) Prco.Natl. 
Acad. Sci 87: 2264. For multiple alignments, the p value can be calculated using a 
heuristic approach such as the one programmed in Blast. Percent sequence identity 
is defined by the ratio of the number of nucleotide matches between the query 
10 sequence and the known sequence when the two are optimally aligned. A selected 

transgene and target endogenous sequences are considered to be substantially 
homologous when the regions of alignment exhibit the aforementioned range of 
percentage of identity using Fasta or Blast alignment program with the default 
settings. 

1 5 Sequence homology can also be determined by functional analyses. A 

sequence that preserves the functionality of the nucleic acid with which it is being 
compared is particularly preferred. Functionality may be established by different 
criteria, such as ability to hybridize with a target polynucleotide, ability to 
effectively amplify a target sequence to yield a substantially homogenous 

20 multiplicity of products, and the ability to extend the 3' end sequence 

complementary to a target sequence in a nucleotide sequencing reaction. 

Where desired, the transgene may comprise heterologous sequences that 
facilitate detection of the expression and purification of the gene product. Examples 
of such sequences are known in the art and include those encoding reporter proteins 

25 such as P-galactosidase, |i -lactamase, chloramphenicol acetyltransferase (CAT), 

luciferase, green fluorescent protein (GFP) and their derivatives. Other heterologous 
sequences that facilitate purification may code for epitopes such as Myc, HA 
(derived from influenza virus hemagglutinin), His-6, FLAG, glutathione S- 
transferase (GST), maltose-binding protein (MBP), or the Fc portion of 

30 immunoglobulin. 

The target endogenous genes whose expression is to be inhibited encompass 
native and heterologous genes present in the host cell. "Native" genes are nucleic 
acid sequences originated from the host cell. Non-limiting illustrative native genes 
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include those encode membrane proteins, cytosolic proteins, secreted proteins, 
nuclear proteins and chaperon proteins. Heterologous genes are sequences acquired 
exogenously by the host cell. Exogenous sequences can be either integrated into the 
host cell genome, or maintained as episomal sequences. An exemplary class of 
5 heterologous genes includes pathogenic genes derived from viruses, bacteria, fungi, 

and protozoa. 

The endogenous genes suitable for the present invention may also be 
characterized based on one or more of the following features: ability to induce a 
phenotypic change in a host cell or organism, species origin, developmental origin, 

10 primary structural similarity, involvement in a particular biological process, 

association with or resistance to a particular disease or disease stage, tissue, sub- 
tissue or cell-specific expression pattern, and subcellular location of the expressed 
gene product. In one aspect, the endogenous gene may be any gene expressed in a 
eukaryote cell, such as a plant cell, animal cell or a yeast cell. In another aspect, the 

15 endogenous gene confers a phenotypic characteristic detectable by visual, 

microscopic, genetic, or chemical means. Within this class of genes, of particular 
interest are plant genes involved in growth phenotypes, e.g. stunting, 
hyperbranching, vein banding, ring spot, etching, and those responsible for color 
characteristics including bleaching and chlorosis. Also, of particular relevance are 

20 genes which upon inhibition provide an enhanced resistance to pathogens (e.g. 

bacteria, fungi, viruses, insects, and protozoa), and resistance to adverse 
environmental factors (e.g. temperature fluctuation, nutritional deficiency, adverse 
soil conditions, moisture, dryness, etc.). 

In another aspect, the endogenous genes are of a specific developmental 

25 origin, such as those expressed in an embryo or an adult organism, during ectoderm, 

mesoderm, or endoderm formation in a multi-cellular animal, or during development 
of leaves, tubers, bud of a plant. In yet another aspect, the endogenous genes belong 
to a family of genes, or a sub-family of genes that share primary structural 
similarities. Structural similarities can be discerned with the aid of computer 

30 software described above. Non-limiting examples of gene families include those 

. encoding proteinase, proteinase inhibitors, cell surface receptors, protein kinases 
(e.g. tyrosine, serine/threonine or histidine kinases), trimeric G-proteins, cytokines, 
PH-, SH2-, SH3-, PDZ-domain containing proteins, and any of those gene families 
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published by the Institute for Genomic Research (TIGR), Incyte Pharmaceuticals, 
Inc., Human Genome Sciences Inc., Monsanto, and PE-Celera. 

In yet another aspect, the endogenous genes are involved in a specific 
biological process, including but not limited to cell cycle regulation, cell 
5 differentiation, chemotaxsis, apoptosis, cell motility and cytoskeletal rearrangement. 

In still another aspect, the endogenous genes embodied in the invention are 
associated with a particular disease or with a specific disease stage. Such genes 
include but are not limited to those associated with autoimmune diseases, obesity, 
hypertension, diabetes, neuronal and/or muscular degenerative diseases, cardiac 

10 diseases, endocrine disorders, any combinations thereof In yet still another aspect, 

the endogenous genes encompass those exhibiting restricted expression patterns. 
Non-limiting exemplary gene transcripts of this class include those that are not 
ubiquitously expressed, but rather are differentially expressed in one or more of the 
plant tissues including leaf, seed, tuber, stems, root, and bud; or expressed in animal 

15 body tissues including heart, liver, prostate, lung, kidney, bone marrow, blood, skin, 

bladder, brain, muscles, nerves, and selected tissues that are affected by various 
types of cancer (malignant or non-metastatic), affected by cystic fibrosis or 
polycystic kidney disease. Additional examples of non-ubiquitously expressed 
genes are those whose gene products are localized to certain subcellular locations: 

20 extracellular matrix, nucleus, cytoplasm, cytoskeleton, plasma and/or intracellular 

membranous structures which include but are not limited to coated pits, Golgi 
apparatus, endoplasmic reticulum, endosome, lysosome, and mitochondria. 

In addition to the above-described elements, the vectors may contain a 
selectable marker (for example, a gene encoding a protein necessary for the survival 

25 or growth of a host cell transformed with the vector), although such a marker gene 

can be carried on another polynucleotide sequence co-introduced into the host cell. 
Only those host cells into which a selectable gene has been introduced will survive 
and/or grow under selective conditions. Typical selection genes encode protein(s) 
that (a) confer resistance to antibiotics or other toxins substances, e.g., ampicillin, 

30 neomycyin, methotrexate, etc.; (b) complement auxotrophic deficiencies; or (c) 

supply critical nutrients not available from complex media. The choice of the proper 
marker gene will depend on the host cell, and appropriate genes for different hosts 
are known in the art. 
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The vectors embodied in this invention can be obtained using recombinant 
cloning methods and/or by chemical synthesis. A vast number of recombinant 
cloning techniques such as PCR, restriction endonuclease digestion and ligation are 
well known in the art, and need not be described in detail herein. One of skill in the 
5 art can also use the sequence data provided herein or that in the public or proprietary 

databases to obtain a desired vector by any synthetic means available in the art. 



Host cell and transgenic organisms of the present invention: 

1 0 The invention provides eukaryotic host cells transformed with the 

recombinant DNA vectors described above. The recombinant vectors containing the 
transgene of interest can be introduced into a suitable eukaryotic cell by any of a 
number of appropriate means, including electroporation, transfection employing 
calcium chloride, rubidium chloride, calcium phosphate, DEAE-dextran, or other 

15 substances; microprojectile bombardment; lipofection; and infection (where the 

vector is coupled to an infectious agent). The choice of introducing vectors will 
often depend on features of the host cell. 

For most animal cells, any of the above-mentioned methods is suitable for 
vector delivery. For plant cells, a variety of techniques derived from these general 

20 methods is available in the art. The host cells may be in the form of whole plants, 

isolated cells or protoplasts. Preferably, the cells are "intact" in that the cell 
comprises an outer layer of cell wall, typically composed of cellulose for protection 
and maintaining the rigidity of the plant cell. Illustrative procedures for introducing 
vectors into plant cells include Agrobacterium-mediated plant transformation, 

25 protoplast transformation, gene transfer into pollen, injection into reproductive 

organs and injection into immature embryos. As is evident to one skilled in the art, 
each of these methods has distinct advantages and disadvantages. Thus, one 
particular method of introducing genes into a particular plant species may not 
necessarily be the most effective for another plant species. 

30 Agrobacterium tumefaciens-mediated transfer is a widely applicable system 

for introducing genes into plant cells because the DNA can be introduced into whole 
plant tissues, bypassing the need for regeneration of an intact plant from a 
protoplast. The use of Agrobacterium-mediated expression vectors to introduce 
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DNA into plant cells is well known in the art. This technique makes use of a 
common feature of Agrobacterium which colonizes plants by transferring a portion 
of their DNA (the T-DNA) into a host cell, where it becomes integrated into nuclear 
DNA. The T-DNA is defined by border sequences which are 25 base pairs long, and 
5 any DNA between these border sequences is transferred to the plant cells as well. 

The insertion of a recombinant plant viral nucleic acid between the T-DNA border 
sequences results in transfer of the recombinant plant viral nucleic acid to the plant 
cells, where the recombinant plant viral nucleic acid is replicated, and then spreads 
systemically through the plant. Agro-infection has been accomplished with potato 

10 spindle tuber viroid (PSTV); CaV; and Lazarowitz, S., NucL Acids Res. 16:229 

(1988)) digitaria streak virus (Donson et al., Virology 162:248 (1988)), wheat dwarf 
and tomato golden mosaic virus (TGMV). Therefore, agro-infection of a susceptible 
plant could be accomplished with a virion containing a recombinant plant viral 
nucleic acid based on the nucleotide sequence of any of the above viruses. Particle 

15 bombardment or electrosporation or any other methods known in the art may also be 

used. 

Because not all plants are natural hosts for Agrobacterium, alternative 
methods such as transformation of protoplasts may be employed to introduce the 
subject vectors into the host cells. For certain monocots, transformation of the plant 

20 protoplasts can be achieved using methods based on calcium phosphate 

precipitation, polyethylene glycol treatment, electroporation, and combinations of 
these treatments. See, for example, Potrykus et al., Mol Gen. Genet., 199:167-177 
(1985); Fromm et al., Nature, 319:791 (1986); Callis et al., Genes and Development, 
1:1 183 (1987). Applicability of these techniques to different plant species may 

25 depend upon the feasibility to regenerate that particular plant species from 

protoplasts. 

In addition to protoplast transformation, particle bombardment is an 
alternative and convenient technique for delivering the invention vectors into a plant 
host cell. Specifically, the plant cells may be bombarded with microparticles coated 
30 with a plurality of the subject vectors. Bombardment with DNA-coated 

microprojectiles has been successfully used to produce stable trans formants in both 
plants and animals (see, for example, Sanford et al. (1993) Methods in Enzymology, 
217:483-509). Microparticles suitable for introducing vectors into a plant cell are 
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typically made of metal, preferably tungsten or gold. These microparticles are 
available for example, from BioRad (e.g., Bio-Rad's PDS-1000/He). Those skilled 
in the art will know that the particle bombardment protocol can be optimized for any 
plant by varying parameters such as He pressure, quantity of coated particles, 
5 distance between the macrocarrier and the stopping screen and flying distance from 

the stopping screen to the target. 

Vectors can also be introduced into plants by direct DNA transfer into pollen 
as described by Zhou et al., Methods in Enzymology, 101:433 (1983); Luo et al., 
Plant Mol Biol. Reporter, 6:165 (1988). Alternatively, the vectors can be injected 
10 into reproductive organs of a plant as described by Pena et al., Nature, 325:274 

(1987). 

Other techniques for introducing nucleic acids into a plant cell include: 

(a) Hand Inoculations. Hand inoculations are performed using a neutral pH, low 
molarity phosphate buffer, with the addition of celite or carborundum 

15 (usually about 1%). One to four drops of the preparation is put onto the 

upper surface of a leaf and gently rubbed. 

(b) Mechanized Inoculations of Plant Beds. Plant bed inoculations are 
performed by spraying (gas-propelled) the vector solution into a tractor- 
driven mower while cutting the leaves. Alternatively, the plant bed is 

20 mowed and the vector solution sprayed immediately onto the cut leaves. 

(c) High Pressure Spray of Single Leaves. Single plant inoculations can also be 
performed by spraying the leaves with a narrow, directed spray (50 psi, 6-12 
inches from the leaf) containing approximately 1% carborundum in the 
buffered vector solution. 

25 (d) Vacuum Infiltration. Inoculations may be accomplished by subjecting a host 

organism to a substantially vacuum pressure environment in order to 
facilitate infection. 



Once introduced into a suitable host cell, expression of the transgene can be 
30 determined using any assay known in the art. For example, the presence of 

transcribed sense or anti-sense strands of the transgene can be detected and/or 
quantified by conventional hybridization assays (e.g. Northern blot analysis), 
amplification procedures (e.g. RT-PCR), SAGE (U.S. Patent No. 5,695,937), and 
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array-based technologies (see e.g. U.S. Pat. Nos. 5,405,783, 5,412,087 and 
5,445,934). i n conducting these analytical procedures, it is preferable to induce 
transcription of one strand of the transgene at a time. As is apparent to one skilled in 
the art, the simultaneous transcription of both sense and anti-sense strands facilitates 
5 formation of double stranded RNA molecules, which may obscure the accurate 

determination of the levels of sense and anti-sense RNA transcripts. 

Expression of the transgene can also be determined by examining the protein 
product. A variety of techniques are available in the art for protein analysis. They 
include but are not limited to radioimmunoassays, ELISA (enzyme linked 
10 immunoradiometric assays), "sandwich" immunoassays, immunoradiometric assays, 

in situ immunoassays (using e.g. , colloidal gold, enzyme or radioisotope labels), 
western blot analysis, immunoprecipitation assays, immunoflourescent assays, and 
PAGE-SDS. 

In general, determining the protein level involves (a) providing a biological 

15 sample containing polypeptides; and (b) measuring the amount of any 

immunospecific binding that occurs between an antibody reactive to the transgene 
product and a component in the sample, in which the amount of immunospecific 
binding indicates the level of expressed proteins. Antibodies that specifically 
recognize and bind to the protein products of the transgene are required for 

20 immunoassays. These may be purchased from commercial vendors or generated and 

screened using methods well known in the art. See Harlow and Lane (1988) supra. 
and Sambrook et al. (1989) supra. The sample of test proteins can be prepared by 
homogenizing the eukaryotic transformants (e.g. plant cells) or their progenies made 
therefrom, and optionally solubilizing the test protein using detergents, preferably 

25 non-reducing detergents such as triton and digitonin. The binding reaction in which 

the test proteins are allowed to interact with the detecting antibodies may be 
performed in solution, or on a solid tissue sample, for example, using tissue sections 
or solid support that has been immobilized with the test proteins. The formation of 
the complex can be detected by a number of techniques known in the art. For 

30 example, the antibodies may be supplied with a label and unreacted antibodies may 

be removed from the complex; the amount of remaining label thereby indicating the 
amount of complex formed. Results obtained using any such assay on a sample 
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from a plant transformant or a progeny thereof is compared with those from a 
non-transformed source as a control. 

The eukaryotic host cells of this invention are grown under favorable 
conditions to effect transcription of the transgene. Non-limiting examples of 
5 eukaryotic hosts are fungus, yeast, plant cells, insect, avian, mammalian or other 

animal cells. The host cells can be used, inter alia, as repositories of the transgene 
and/or vehicles for production of the transgene-specific double stranded RNAs. The 
host cells may also be employed to generate transgenic organisms such as transgenic 
animals and plants comprising the recombinant DNA vectors of the present 
10 invention. Preferred host cells are those having the propensity to regenerate into 

tissue or a whole organisms. Examples of these preferred host cells are oocytes, 
blastocytes, and certain plant cells exemplified herein. 



Accordingly, this invention provides transgenic plants carrying the subject 

15 vectors. In a preferred embodiment, the trangenic plant exhibits a reduced 

expression (when compared to a control plant) of an endogenous gene that is 
substantially homologous to the transgene carried in the subject vector. 

The regeneration of plants from either single plant protoplasts or various 
explants is well known in the art. See, for example, Methods for Plant Molecular 

20 Biology, Mary A. Shuler and Raymond E. Zielinski, Academic Press, Inc., San 

Diego, Calif. (1988). This regeneration and growth process includes the steps of 
selection of transformant cells and shoots, rooting the transformant shoots and 
growth of the plantlets in soil. 

The regeneration of plants containing the subject vector introduced by 

25 Agrobacterium tumefaciens from leaf explants can be achieved as described by 

Horsch et al., Science, 227:1229-1231 (1985). In this procedure, transformants are 
grown in the presence of a selection agent and in a medium that induces the 
regeneration of shoots in the plant species being transformed as described by Fraley 
et al., Proc. Natl Acad. Set U.S.A., 80:4803 (1983). This procedure typically 

30 produces shoots within two to four weeks and these transformant shoots are then 

transferred to an appropriate root-inducing medium containing the selective agent 
and an antibiotic to prevent bacterial growth. Transformant shoots that rooted in the 
presence of the selective agent to form plantlets are then transplanted to soil to allow 
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the production of roots. These procedures will vary depending upon the particular 
plant species employed, as is apparent to one of ordinary skill in the art. 

A population of progeny can be produced from the first and second 
transformants of a plant species by methods well known in the art including cross 
5 fertilization and asexual reproduction. Transgenic plants embodied in the present 

invention are useful for production of desired proteins, and as test systems for 
analysis of the biological functions of a gene. 



Uses of the vectors of the present invention: 

10 The subject vectors provide specific reagents for inhibiting expression of an 

endogenous gene present in a host cell. The expression inhibition methods may be 
used in a wide variety of circumstances including suppression of a gene associated 
with a particular disease or disease stage; delineating the biological functions of a 
gene by analyzing a phenotypic change in the host cell that correlates with the 

1 5 selective suppression of gene expression; and facilitating drug screening by 

rendering the host cell more susceptible or resistant to a therapeutic agent of interest. 

Accordingly, this invention provides a method of inhibiting expression of an 
endogenous gene present in a eukaryotic cell. The method comprises the steps of: 
(a) providing a subject vector containing a transgene that is substantially 

20 homologous to an endogenous gene of a eukaryotic cell; (b) introducing the 

recombinant vector into the eukaryotic cell; (c) culturing the eukaryotic cell of (b) 
under conditions favorable for expression of both sense and antisense RNA 
transcripts from the transgene, and thereby inhibiting expression of the 
corresponding endogenous gene in the eukaryotic cell. 

25 In a separate embodiment, the invention provides a method of identifying a 

biological function(s) of an endogenous gene of interest in a eukaryotic cell by 
selectively inhibiting the expression of the endogenous gene. The method involves: 
(a) providing a recombinant vector of the present invention, wherein the transgene 
contained in the vector is substantially homologous to the endogenous gene; (b) 

30 introducing the recombinant vector of (a) into the eukaryotic cell; (c) culturing the 

eukaryotic cell of (b) under conditions favorable for expression of both sense and 
antisense RNA transcripts from the transgene contained in the recombinant vector 
and thereby inhibiting expression of the endogenous gene in the eukaryotic cell; and 
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(d) determining one or more phenotypic changes in the eukaryotic cell that correlate 
with the inhibited expression of the endogenous gene, thereby identifying the 
biological function(s) of the endogenous gene in the eukaryotic cell. 

The host cells encompassed by these embodiments are eukaryotic cells 
5 susceptible to dsRNA-mediated "genetic interference". dsRNA induced gene 

silencing has been observed in a variety of multi-cellular organisms including but 
not limited to worms, fruitflies, protozoa, fungi, mammals, and zebrafish. Thus, 
cells from any of these exemplary organisms can be employed. Suitable host cells 
may be derived from primary cultures or subcultures generated by expansion and/or 

10 cloning of primary cultures. Any cells capable of growth in culture can be used as 

host cells. Of particular interest is the type of cell that differentially expresses (over- 
expresses or under-expresses) a disease-causing gene. As is apparent to one skilled 
in the art, various cell lines may be obtained from public or private repositories. The 
largest depository agent is American Type Culture Collection (http://www.atcc.org), 

1 5 which offers a diverse collection of well-characterized cell lines derived from a vast 

number of organisms and tissue samples. 

Upon delivery of the subject vectors, the host cells are cultured under 
conditions favorable for gene transcription. The parameters governing eukaryotic 
cell survival are generally applicable for induction of gene transcription. The culture 

20 conditions are well established in the art. Physicochemical parameters which may 

be controlled in vitro are, e.g., pH, C0 2 , temperature, and osmolality. The 
nutritional requirements of cells are usually provided in standard media formulations 
developed to provide an optimal environment. Nutrients can be divided into several 
categories: amino acids and their derivatives, carbohydrates, sugars, fatty acids, 

25 complex lipids, nucleic acid derivatives and vitamins. Apart from nutrients for 

maintaining cell metabolism, most cells also require one or more hormones from at 
least one of the following groups: steroids, prostaglandins, growth factors, pituitary 
hormones, and peptide hormones to survive or proliferate (Sato, G.H., et al. in 
"Growth of Cells in Hormonally Defined Media", Cold Spring Harbor Press, N.Y., 

30 1982; Barnes and Sato (1980) Anal. Biochem., 102:255. Given the vast wealth of 

information on the nutrient requirements, medium conditions optimized for cell 
survival, one skilled in the art can readily fashion various culture conditions using 
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any one of the aforementioned methods and compositions, alone or in any 
combination. 

The inhibition of expression of the endogenous gene sharing substantial 
sequence homology with the transgene carried in the vectors can be determined by 
5 assaying for a difference, between the host cell and the control cell, in the level of 

mRNA transcripts of the endogenous gene. Alternatively, a suppression in 
expression is determined by detecting a difference in the level of the polypeptide(s) 
encoded by the endogenous gene. A preferred method is to detect a phenotypic 
change resulting from the decrease in expression of the endogenous gene of interest. 

10 In assaying for an alteration in mRNA level, nucleic acid contained in the 

host cells is first extracted according to standard methods in the art. For instance, 
mRNA can be isolated using various lytic enzymes or chemical solutions according 
to the procedures set forth in Sambrook et al. (1989), supra or extracted by nucleic- 
acid-binding resins following the accompanying instructions provided by 

15 manufacturers. The mRNA contained in the extracted nucleic acid sample is then 

detected by hybridization (e.g. Northern blot analysis) and/or amplification 
procedures according to methods widely known in the art or based on the methods 
exemplified herein. 

Reduction in expression of the endogenous gene can also be determined by 

20 examining the protein product of the endogenous gene. A variety of techniques is 

available in the art for protein analysis. They include but are not limited to 
radioimmunoassays, ELISA (enzyme linked immunoradiometric assays), 
"sandwich" immunoassays, immunoradiometric assays, in situ immunoassays (using 
e.g., colloidal gold, enzyme or radioisotope labels), western blot analysis, 

25 immunoprecipitation assays, immunoflourescent assays, and SDS-PAGE. In 

addition, cell sorting analysis can be employed to detect cell surface antigens. Such 
analysis involves labeling target cells with antibodies coupled to a detectable agent, 
and then separating the labeled cells from the unlabeled ones in a cell sorter. A 
sophisticated cell separation method is fluorescence-activated cell sorting (FACS). 

30 Cells traveling in single file in a fine stream are passed through a laser beam, and the 

fluorescence of each cell bound by the fluorescently labeled antibodies is then 
measured. 
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Antibodies that specifically recognize and bind to the protein products of 
interest are required for conducting the aforementioned protein analyses. These 
antibodies may be purchased from commercial vendors or generated and screened 
using methods well known in the art. See Harlow and Lane (1988) supra, and 
5 Sambrook et al. (1989) supra. 

Inhibition of gene expression can also result in phenotypic change(s) in a 
host cell. As used herein, phenotypic change refers to any non-genotypic change 
that can be detected visually, or analyzed biochemically or genetically. The choice 
of detection methods will largely depend on the nature of the phenotypic 

1 0 characteristics that are under investigation. For instance, certain phenotypic features 

of a plant cell can be detected microscopically or macroscopically. These features 
include improved tolerance to herbicides, improved tolerance to extremes of heat or 
cold, drought, salinity or osmotic stress; improved resistance to pests (insects, 
nematodes or arachnids) or diseases (fungal, bacterial or viral), production of 

15 enzymes or secondary metabolites; male or female sterility; dwarfhess; early 

maturity; improved yield, vigor, heterosis, nutritional qualities, flavor or processing 
properties, and the like. Other detectable phenotypic changes are morphological 
alterations including but not limited to stunting, hyperbranching, vein banding, ring 
spot, etching, and those responsible for color characteristics including bleaching and 

20 chlorosis. 

For animal cells, detectable phenotypic changes may encompass alterations 
in cell cycle regulation, cell differentiation, apoptosis, chemotaxsis, cell motility and 
cytoskeletal rearrangement. Methods for detecting these phenotypic changes are 
well-established in the art and hence are not detailed herein. 

25 Other phenotypic changes commonly observed in both plant and animal cells 

involve differential expression (over-expression or under-expression) of a particular 
protein due to the selective inhibition of the endogenous gene of interest. 
Differential gene expression may be analyzed by any chemical means available in 
the art or those disclosed herein. As is also apparent to artisans, altering expression 

30 of one endogenous gene may lead to changes in gene expression profile of a host of 

genes mapped to the same or related signal transduction pathways. As used herein, 
"signal transduction" refers to the process by which stimulatory or inhibitory signals 
are transmitted into and within a cell to elicit an intracellular response. Any 
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fluctuation in intracellular response of a eukaryotic host cell is also considered as a 
type of phenotypic change. 

Alteration in intracellular response is often determined with the aid of 
reporter molecules. For example, when examining a signaling cascade involving a 
5 fluctuation of intracellular pH condition, pH sensitive molecules such as fluorescent 

pH dyes can be used as the reporter molecules. In another example where the 
signaling pathway of a trimeric G q protein is analyzed, calcium-sensitive fluorescent 
probes can be employed as reporters. As is apparent to artisans in the field of signal 
transduction, trimeric G q protein is involved in a classic signaling pathway, in which 

10 activation of G q stimulates hydrolysis of phosphoinositides by phospholipase C to 

generate two classes of well-characterized second messengers, namely, 
diacylglycerol and inositol phosphates. The latter stimulates the mobilization of 
calcium from intracellular stores, and thus resulting in a transient surge of 
intracellular calcium concentration, which is a readout measurable with a calcium- 

1 5 sensitive probe. 

Another exemplary class of reporter molecules is a reporter gene operably 
linked to an inducible promoter that can be activated upon the stimulation or 
inhibition of a signaling pathway. Reporter proteins can also be linked with other 
proteins whose expression is dependent upon the stimulation or suppression of a 

20 given signaling cascade. Commonly employed reporter proteins can be easily 

detected by a colorimetric or fluorescent assay. Non-limiting examples of such 
reporter proteins include : p-galactosidase, P -lactamase, chloramphenicol 
acetyltransferase (CAT), luciferase, green fluorescent protein (GFP) and their 
derivatives. Those skilled in the art will know of other suitable reporter molecules 

25 for assaying changes in a specific signaling transduction readout, or will be able to 

ascertain such, using routine experimentation. 

To discern inhibition of gene expression, one typically conducts a 
comparative analysis of the subject and appropriate controls. Preferably, a test 
includes a positive control sample exhibiting a decrease in gene expression and a 

30 negative control having an unaltered expression level. The selection of an 

appropriate control cell or tissue is dependent on the sample cell or tissue initially 
selected and its phenotype which is under investigation. 
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In one aspect, the invention methods can be employed to selectively inhibit 
expression of an endogenous gene that is native to the eukaryotic host cell. Such a 
gene may encode encodes a protein selected from the group consisting of a 
membrane protein, a cytosolic protein, a secreted protein, a nuclear protein and a 
5 chaperon protein. Of particular interests are endogenous genes that confer 

phenotypic changes as a result of inhibition of the expression and/or function of the 
endogenous genes. In another aspect within this embodiment, the endogenous gene 
is heterologous to the host cell. As used herein, heterologous genes are acquired 
exogenously by the host cell. Non-limiting examples of heterologous genes are 

10 those derived from virus, bacterium, fungus, and protozoa. 

In a separate embodiment, the invention methods are used to identify a 
biological fimction(s) of an endogenous gene in a eukaryotic cell by examining a 
phenotypic change associated with the inhibition in its expression and thus loss of 
biological function. In essence, the subject methods allow the creation of a transient 

15 or more long-term gene-specific knock-out system for analyzing the biological 

function of any endogenous gene of interest. 

Kits comprising the vectors of the present invention 

The present invention also encompasses kits containing the vectors of this 
20 invention in suitable packaging. Kits embodied by this invention include those that 

allow generation of a double-stranded RNA transcript in a eukaryotic cell. 

Each kit necessarily comprises the reagents which render the delivery of 
vectors into a eukaryotic host cell possible. The selection of reagents that facilitate 
delivery of the vectors may vary depending on the particular transfection or 
25 infection method used. The kits may also contain reagents useful for generating 

labeled polynucleotide probes or proteinaceous probes for detection of gene 
silencing. Each reagent can be supplied in a solid form or dissolved/suspended in a 
liquid buffer suitable for inventory storage, and later for exchange or addition into 
the reaction medium when the experiment is performed. Suitable packaging is 
30 provided. The kit can optionally provide additional components that are useful in 

the procedure. These optional components include, but are not limited to, buffers, 
capture reagents, developing reagents, labels, reacting surfaces, means for detection, 
control samples, instructions, and interpretive information. The kits can be 
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employed to generate eukaryotic cells whose endogenous genes are selectively 
inhibited, and transgenic organisms comprising these eukaryotic cells. 

Further illustration of the development and use of vectors and assays 
according to this invention are provided in the Example section below. The 
5 examples are provided as a guide to a practitioner of ordinary skill in the art, and are 

not meant to be limiting in any way. 
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EXAMPLES 

Example 1: Construction of recombinant vectors comprising two opposing 

transcription units 

5 

We have designed a recombinant vector construct useful for silencing 
nuclear genes in many of the agriculturally-important cereal crops. The vector 
comprises sequences derived from maize streak geminivirus, isolated MSV-Kom 
(genbank accession number AF003952 . classification: Family Geminiviridae, genus 

10 Mastrevirus, species maize streak virus, designated MSV-Komatipoort. Maize 

streak virus has a broad host range that encompasses all agriculturally important 
cereal crops, including but not limited to corn, wheat, rice, barley, rye, sorghum and 
millet. The methods for construction of infectious geminiviruses are well known to 
those skilled in the art, and are described in European patent application 8687015.5 

1 5 as well as in US Patent No. 5,569,597. 

We have synthesized a 1618 base pair synthetic DNA that contains the 
MSV-Kom repA and repB, long intergenic region (LIR) and short intergenic region 
(SIR) and thus all sequences that are required for viral replication. Palmer et 
al.(1999,) Archives of Virology 144: 1345-1360. This fragment was cloned into the 

20 pZeRO-2 vector (Invitrogen) as an EcoRl-Xbal fragment, to create the plasmid 

pMSVLSB-1, the sequence of which is shown in Figure 4. A 171 base pair 
fragment containing the movement protein (mp) promoter of MSV-Kom is 
synthesised and cloned into the pZeRO-2 vector as an Hin&lll-EcdKL fragment to 
create pMSVLSB-2 (sequence shown in Figure 5). The Apal fragment containing 

25 the mp promoter is inserted between the two Apal sites in pMSVLSB-1 , to create 

pMSVLSB-3 (sequence shown in Figure 6). 

The cauliflower mosaic virus 35S RNA promoter (CaMV 35S promoter) 
sequence is amplified with a vector containing this sequence (pBI121, from 
30 Clontech) as template DNA, using the following PCR primers containing the 

following restriction sites (shown in italicized): EcoRl in CaMV35SF and Sail in 
CaMV35SR. 

CaMV35SF: 
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TTTGAA 7TCGTC AAC ATGGTGGAGC AC (SEQ ID NO:l) 
CaMV35SR: 

TTTGrCG^CGTCCTCTCCAAATGAAATGAAC (SEQ ID NO:2) 

The CaMV 35S promoter PCR product yielded is digested with EcoKL and 
Sail and the restricted fragments are purified. 



The zeocin resistance gene is amplified by PCR with the vector pZeRO-1 
10 (Invitrogen) as template, using the following primers containing the following 

restriction sites shown in italicized: Sail, Pad and Notl in ZeoF and Xhol, Pad and 
Notl in ZeoR: 

ZeoF: 

1 5 CCCGTCGA CTTAA TTAA GCGGCCGCGTTT AC AATTTCGCCTGATGC 

(SEQ ID NO:3) 

ZeoR: 

CCCCrCG^G7T^4r7X4GCGGCCGCCTCAAAAAGGATCTTCACCTA 
20 G (SEQIDNO:4) 

The zeocin resistance gene product yielded is digested with Xhol and Sail 
and purified. 

25 The nopaline synthase (nos) terminator sequence is amplified by PCR with 

the vector pBI121 (Clontech) as template, using the following primers, with 
restriction sites Xhol in nosF and Spel in nosR italicized: 

NosF: 

30 TTTCrCGv4GCGAATTTCCCCGATCGTTCAAAC (SEQIDNO:5) 

NosR: 

TTT/1 CTA G7CCCG ATCT AGT AAC AT AGATGAC (SEQ ID NO:6) 
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The nos terminator product yielded is digested with Xhol and Spel and 
purified. 

5 The digested CaMV35S promoter, zeocin resistance gene and nos terminator 

sequences are ligated together with T4 DNA ligase. The ligated product is diluted 
1:100 in sterile water and the whole ligation product is re -amplified with the 
CaMV35SF and nosR primers. The resulting PCR product is digested with EcoRI 
and Spel, purified and ligated with pMSVLSB-3 that is pre-digested with EcoRI and 

10 Spel. The ligation reaction is used to transform E. coli competent cells. 

Transformants are selected on Luria Agar plates containing both kanamycin (100 
Hg/ml) and zeocin (50 |Ag/ml) to select for colonies containing the CaMV35S 
promoter-zeocin resistance gene-nos terminator cassette inserted into pMSVLSB-3 
(Figure 6 and SEQ ID NO:l 1). Colonies putatively containing the correct plasmid 

15 are chosen, plasmid DNA isolated and screened by digestion with EcoRL and Spel. 

One plasmid designated pMSVLSB-4 (Figure 7 and SEQ ID NO: 12) is selected. 

One of the methods in the art of construction of infectious clones of 
geminivirus genomes is to clone tandemly duplicated sequences of the geminivirus 
genome, with at least the LIR duplicated. This allows the virus sequence to escape 

20 from the cloning vector in planta by a replicative release mechanism. The virus Rep 

protein is transiently expressed in transfected cells, and induces a nick at each of the 
stem loop sequences contained within the origin of replication in the LIR. Rolling 
circle replication is initiated at each nick point, and this results in release of a 
ssDNA copy of the virus replicon, which is circularized by the Rep protein, and 

25 which then replicates autonomously in the plant cell nucleus. The Xbal-Spel 

fragment from pMSVLSB-3, containing the viral LIR and Rep genes is inserted into 
the unique Spel site in pMS VLSB-4 to create pMS VLSB-5 (Figure 8 and SEQ ID 
NO: 1 3). The zeocin resistance gene is deleted by digestion with Notl; the DNA is 
recircularized and used to transform E.coli to kanamycin resistance with a new 

30 vector, pMSVLSB-6 (Figure 9 and SEQ ID NO: 14). When the vector is introduced 

into plant cells, a monomelic copy of the insert is released by replicative release 
(described above) and replicates autonomously as construct MSVLSB-6 in the 
nuclei of infected cells. 
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The restriction map of construct MSVLSB-6 is shown in Figure 3; this 
genetic construct possesses the following features: (a) the rep genes and origins of 
replication from maize streak geminivirus that are necessary and sufficient for the 
autonomous replication of the viral construct and its associated foreign DNA in the 
5 host plant cell; (b) two overlapping transcription units present in the DNA replicon. 

The two overlapping transcription units are arranged according to the configuration 
shown in Figure 2. With reference to Figure 2, "promoter 1" and "terminator 1" in 
MSVLSB-6 are the MSV mp promoter and transcription termination signals present 
in the SIR, respectively, and "promoter 2" and "terminator 2" are the CaMV 35S 
10 RNA promoter and nos terminator sequences, respectively. The two overlapping 

transcription units share three unique restriction sites (Sail, Pad and Not!) and one 
non-unique restriction site (Xhol) where foreign DNA may be inserted so that it may 
be transcribed by both promoters to yield at least a partially double stranded RNA 
duplex of the foreign DNA sequence. 



15 



20 



Example 2: Use of recombinant vectors to inhibit or silence gene expression 

in cereal crops: 

Application of pMSVLSB-6 in inhibition of Dwarf 1 gene expression in rice 



The vector pMSVLSB-6 exemplified above can be employed to inhibit 
expression of any endogenous gene in a variety of plant host cells. By way of 
illustration, the rice gene Dwarf 1 is inhibited to duplicate known mutant phenotype 
using a pMSVLSB-6 containing a fragment of the coding sequence of Dwarf 1 

25 (Genbank accession number AB028602). The gene is amplified from cDNA 

isolated from rice seedlings. Primer sequences are designed to have homology with 
the published sequence of Dwarfl. Ashikari et al (1999) PNAS U.S.A. 96:10284- 
10289. The primer sequences contain Noil restriction sites at their 5' ends. The 
PCR product is digested with Notl and cloned into the Noil site of pMSVLSB-6 to 

30 generate pMSVLSB-6::dwarfls and pMSVLSB-6::dwarfla, with the insert cloned 

in the sense and antisense orientation with respect to the MSV mp promoter, 
respectively. The Xbal-Spel fragment from each of these plasmids is transferred 
into an Agrobacterium binary vector that is commonly used for rice transformation. 
This vector is used to transform electrocompetent Agrobacterium strain LBA4404 
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(Life Technologies). Agrobacterium cultures containing the appropriate plasmids 
are used in transformation of rice. Transgenic rice is generated by standard 
protocols (see, e.g. US Patent 5,591,616). The transgenic rice plants display similar 
phenotypes to the dwarfl mutant described by Ashikari et al (1999) supra: they are 
giberellin-insensitve, dwarfed in comparison with un-silenced transgenic controls, 
and having broad, dark green leaves, compact pannicles and short, round grains. 

Application of pMSVLSB-6 in inhibition of phytoene desaturase expression 
in maize seedlings 



The coding sequence for the maize phytoene desaturase gene (pds), having 
the Genbank accession number U37285, is amplified from cDNA made from RNA 
isolated from four-day-old maize seedlings, of the cultivar "Golden Cross Bantam". 
The primers used for amplification of this cDNA have the following sequences 
15 containing the Pad sites (italicized) at the 5' ends: 

zeapdsl330: 

TTrrTM^TTMAGGTCCGCCTGAATTCTCG (SEQ ID NO:7) 

20 zeapdsl873 

TTTT7M^rrA4CGGCAAGGCTCACAGTTTG (SEQ ID NO:8) 

PCR amplification with these primers and cDNA made from RNA isolated 
from maize seedlings yields a product of 565 base pairs, which is then digested with 

25 Pad. The progenitor plasmid to pMSVLSB-6, pMSVLSB-5 is digested with Xbal 

and Spel to release the MSV and associated overlapping transcription unit sequences 
from the pZeRO-2 cloning vector as a single 4816 base pair fragment. This 
fragment is cloned into the Agrobacterium binary vector pBinl9 (Genbank: 
U09365) digested with Xbal to yield pMSVLSB-7. The plasmid pMSVLSB-7 is 

30 digested with Pad and the pds PCR fragment is inserted into this position, 

generating plasmid pMSVLSB-7::/?</sl (cloned in the sense orientation with respect 
to the MSV mp promoter) and pMSVLSB-7::/?ds2 (cloned in the antisense 
orientation with respect to the MSV mp promoter. These two plasmids are each 
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introduced into Agrobacterium strain C58Cl(pMP90) (Koncz and Schell, 1985) by 
electroporation. The Agrobacterium containing the binary vector plasmids is grown 
overnight in Luria Bertani medium containing appropriate selective antibiotics. The 
bacterial suspension is loaded into a 100 (al Hamilton syringe and injected into three 
5 day old maize seedlings (cultivar Golden Cross Bantam) according to methods 

described by Escudero et al. (1994) in the chapter "Agroinfection" of The Maize 
Handbook, Freelings M, Walbot V (eds). Plants that are successfully agroinfected 
display a photobleaching phenotype on the first three leaves, similar to that induced 
by spraying the plants with the phytoene desaturase-inhibitor norfluorazon. 

10 
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CLAIMS 

What is claimed is: 

1 . A eukaryotic recombinant vector comprising a viral replicon having two 
5 overlapping transcription units arranged in an opposing orientation and flanking a 

transgene of interest, wherein the two overlapping transcription units yield both 
sense and antisense RNA transcripts from the same transgene in a eukaryotic host 
cell. 

10 2. The eukaryotic recombinant vector of claim 1, wherein each of the 

overlapping transcription units comprises a promoter and a terminator. 

3. The eukaryotic recombinant vector of claim 2, wherein the promoter is a 
constitutive promoter. 

15 

4. The eukaryotic recombinant vector of claim 2, wherein the promoter is 
an inducible promoter. 

5. The eukaryotic recombinant vector of claim 2, wherein the promoter is a 
20 tissue-specific promoter. 

6. The eukaryotic recombinant vector of claim 1, wherein the promoter and 
the terminator of the overlapping transcription units are arranged in a configuration 
shown in Figure 2(a). 

25 

7. The eukaryotic recombinant vector of claim 1, wherein the promoter and 
the terminator of the overlapping transcription units are arranged in a configuration 
shown in Figure 2(b). 

30 8. The eukaryotic recombinant vector of claim 1, wherein the promoter and 

the terminator of the overlapping transcription units are arranged in a configuration 
shown in Figure 2(c). 
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9. The eukaryotic recombinant vector of claim 1, wherein the promoter and 
the terminator of the overlapping transcription units are arranged in a configuration 
shown in Figure 2(d). 

5 10. The eukaryotic recombinant vector of claim 1 that inhibits gene 

expression of the eukaryotic host cell. 

11. The eukaryotic recombinant vector of claim 1, wherein the eukaryotic 
host cell is selected from the group consisting of fungus, yeast cell, plant cell and 

10 animal cell. 

12. The eukaryotic recombinant vector of claim 1 that inhibits expression of 
an endogenous gene present in the host cell, wherein the endogenous gene is 
substantially homologous to the transgene contained in the overlapping transcription 

15 units. 

13. The eukaryotic recombinant vector of claim 12, wherein the endogenous 
gene is native to the host cell. 

20 14. The eukaryotic recombinant vector of claim 12, wherein the endogenous 

gene is heterologous to the host cell. 

15. The eukaryotic recombinant vector of claim 12, wherein the endogenous 
gene is a pathogenic gene derived from one or more members of the group 

25 consisting of virus, bacterium, fungus, and protozoa. 

16. The eukaryotic recombinant vector of claim 1, wherein expression of the 
transgene to yield double-stranded RNA transcripts confers a phenotypic change in 
the eukaryotic host cell. 

30 

17. The eukaryotic recombinant vector of claim 1, wherein the transgene 
encodes a protein selected from the group consisting of a membrane protein, a 
cytosolic protein, a secreted protein, a nuclear protein, and a chaperon protein. 
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18. The eukaryotic recombinant vector of claim 1 that is an autonomously 
replicating vector. 

5 19. The eukaryotic recombinant vector of claim 1, wherein the viral replicon 

is derived from a DNA virus. 

20. The eukaryotic recombinant vector of claim 19, wherein the DNA virus 
is selected from the group consisting of Geminivirus, Caulimoviridae, 

10 Badnaviridae; Circoviridae, Circinoviridae t Parvoviridae, Papovaviridae, 

Polyomaviridae, Adenoviridae, Herpesviridae, Poxviridae, Iridoviridae, 
Baculoviridae, Hepadnaviridae, Retroviridae, Gyrovirus, Nanovirus, and African 
Swine Fever virus. 

15 

21. A host cell transformed with a vector of claim 1 or 10. 

22. The host cell of claim 21 that is a eukaryotic cell selected from the group 
consisting of fungus, yeast cell, plant cell and animal cell. 

20 

23. A transgenic plant comprising a eukaryotic recombinant vector of claim 

1 or 10. 

24. The transgenic plant of claim 23 exhibiting reduced expression of an 
25 endogenous gene that is substantially homologous to the transgene contained in the 

eukaryotic recombinant vector. 

25. A kit for generating a double-stranded RNA transcript in a eukaryotic 
cell comprising a eukaryotic recombinant vector of claim 1 in suitable packaging. 

30 

26. A method of inhibiting expression of an endogenous gene present in a 
eukaryotic cell, comprising: 

(a) providing a eukaryotic recombinant vector of claim 12; 



39 



WO 01/77350 



PCT/US01/11436 



(b) introducing the eukaryotic recombinant vector into the eukaryotic 
cell; 

(c) culturing the eukaryotic cell of (b) under conditions favorable for 
expression of both sense and antisense RNA transcripts from the 

5 transgene that is contained in the transcription units of the vector, and 

thereby inhibiting expression of the corresponding endogenous gene 
in the eukaryotic cell. 



27. The method of claim 26, wherein the endogenous gene is native to the 
10 host cell. 



28. The method of claim 26, wherein the endogenous gene is heterologous to 
the host cell. 

15 29. The method of claim 26, wherein the endogenous gene is a pathogenic 

gene derived from one or more members of the group consisting of virus, bacterium, 
fungus, and protozoa. 

30. The method of claim 26, wherein inhibition of the endogenous gene 
20 confers a phenotypic change in the host cell. 

31. The method of claim 26, wherein the host eukaryotic cell is selected from 
the group consisting of fungus, yeast cell, plant cell, and animal cell. 

25 32. The method of claim 26, wherein the eukaryotic recombinant vector is an 

autonomously replicating vector. 

33. The method of claim 26, wherein the eukaryotic recombinant vector 
comprises a viral replicon derived from a DNA virus. 



30 



34. The method of claim 26, wherein the DNA virus is selected from the 
group consisting of Geminivirus, Caulimoviridae, Badnaviridae; Circoviridae, 
Circinoviridae, Parvoviridae, Papovaviridae, Polyomaviridae, Adenoviridae, 
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Herpesviridae, Poxviridae, Iridoviridae, Baculoviridae, Hepadnaviridae, 
Retrovirida, Gyrovirus, Nanovirus, and African Swine Fever virus. 

35. The method of claim 26, wherein the eukaryotic recombinant vector 
5 comprises two overlapping transcription units, wherein each transcription unit 

comprises a promoter and a terminator. 

36. The method of claim 26, wherein the promoter is a constitutive promoter. 

10 37. The method of claim 26, wherein the promoter is an inducible promoter. 

38. The method of claim 26, wherein the promoter is a tissue-specific 
promoter. 

15 39. The method of claim 35, wherein the promoter and the terminator of the 

overlapping transcription units are arranged in a configuration shown in Figure 2(a). 

40. The method of claim 35, wherein the promoter and the terminator of the 
overlapping transcription units are arranged in a configuration shown in Figure 2(b). 

20 

41. The method of claim 35, wherein the promoter and the terminator of the 
overlapping transcription units are arranged in a configuration shown in Figure 2(c). 

42. The method of claim 35, wherein the promoter and the terminator of the 
25 overlapping transcription units are arranged in a configuration shown in Figure 2(d). 

43. A method of identifying a biological fimction(s) of an endogenous gene 
of interest in a eukaryotic cell by selectively inhibiting the expression of the 
endogenous gene, the method comprising: 

30 (a) providing a eukaryotic recombinant vector of claim 12; 

(b) introducing the eukaryotic recombinant vector of (a) in to the 
eukaryotic cell; 
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(c) culturing the eukaryotic cell of (b) under conditions favorable for 
expression of both sense and antisense RNA transcripts from the 
transgene contained in the eukaryotic recombinant vector and thereby 
inhibiting expression of the endogenous gene in the eukaryotic cell; 
and 

(d) determining one or more phenotypic changes in the eukaryotic cell 
that correlate with the inhibited expression of the endogenous gene, 
thereby identifying the biological function(s) of the endogenous gene 
in the eukaryotic cell. 

44. The method of claim 43, wherein the eukaryotic cell is selected from 
the group consisting of fungus, yeast cell, plant cell, and animal cell. 

45. The method of claim 43, wherein the eukaryotic cell is a plant cell. 

46. The method of claim 43, wherein the eukaryotic cell is an animal cell. 
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Figure 4 

pMSVLSB-1: 4881 bp; 

Composition 1161 A; 1260 C; 1251 Gj 1209 T; 0 OTHER 
Percentage: 24% A; 26% C; 26% G; 25% T; 0 %OTHER 

Molecular Weight (kDa) ssDNA: 1506-65 dsDNA: 3009.2 
ORIGIN 

1 AGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 

61 * ACGACAGGTT TCCCGACTGG AAAG CGGG CA GTGAGCG CAA CGCAATTAAT GTGAGTTAGC 

121 TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG 'TTGTGTGGAA 

181 TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC GCCAAGCTAT 

241 TTAGGTGACA CTATAGAATA CTCAAGCTAT GCATCAAGCT TGGTACCGAG CTCGGATCCA 

3 01 CTAGTAACGG CCGCCAGTGT GCTGGAATTC ATGGGCAGAC CCGTCTGTAC TTTAAGAGTG 

3 61 TTGGCAACCA GTAATGAATA AAAACTCCCG TTTTATTATA TTTGATGAAT? GCTGAAAGCT 

421 TACATTAATA TGTCGTGCGA <TGGCA CGAAA AAACACACGC AAACAATACA GGGGGGTAkT 

481 CGGCGGGCGG CTAAGGGTGG TGCTCGGCGG GCAGAACAtfC GAAAAATCAA GATCTATATG 

541 AATTACACTT CCTCCGTAGG AGGAAGGACA GGGGGAGAAT ACCACTTCTC CCCCGGCGAC 

601 ATAATGTAAA TGACGCAGTT TGCCTCGAAA TACTCCAGCT GCCCTGGAGT CATTTCCTTC 

661 ATCCAATCTT CATCCGA.GTT GGCGAGGATT ATTGTAGGCT TAGACTTCTT CTGCACCTTT 

721 TTCTTCTTAC CATACTTGGG GTTTACAATG AAATCCCTCT GACAG CCAAC TAACTGTTTC 

7 81 CAACAAGGAC AGAATTTAAA CGGAATATCA TCTACGATGT TGTAGATTGC GTCTTCGTTG 

841 TATGAAGACC AATCAACATT ATTTTGCCAG TAATTATGAA CCCCTAGGCT TCTGGCCCAA 

901 GTAGATTTTC CGGTTCTTGX TGGGCCGACG ATGTAGAGGC TCTGCTTTCT TGATCTTTCA 

961 TCTGATGACT GGATACAGAA TCCATCCATT GGAGGTCAGA AATTGCATCC TCGAGGGTAT 

1021 AACAGGTAGG TTGAAGGAGC ATGTAAGCTT CGGGACTAAC CTGGAAGATG TTAGGCTGGA 

1081 GCCAATCGTT GATTGACTCA TTACAAAGTA AATCAGGTGA GGAGGGTGGA TGAGGATTGG 

1141 TGAACTCTTC CTGAATCTCA GGAAAAAGCT TATTTGCAGA GTATTGAAAA TACTGCAATT 

12 01 TTGTGGACCA ATCAAAGGGG AGCTCTTTCT GGATCATGGA GAGGTACXCT TCTTTGGAGG 
1261 TAG CGTGTGA AATAATGTCT CGCATTATTT CATCTTTAGA AGGCTTTTTT TCCTTTACCT 

13 21 CTGAATCAGA TT^TCCTAGG AAGGGGGACT TCCTAGGAAT GAAAGTACCT CTCTCAAACA 
13 ai CAGCCAGAGG TTCCTTGAGA ATGTAATCCC TCACTCTGTT AACTGACTTG GCACTCTGAA 
1441 TATTTGGGTG AAACCCATTT ATATCAAAGA ACCTTGAGTC AGATATCCTT ATCGGCTTCT 
1501 CTGGCTGAAG CAATGCATGT AAATGCAAAC TTCCATCTTT ATGTGCCTCT CGGG CA CAT A 
1561 GAATATATTT GGGAATCCAA CGAACGACGA GCTCCCAGAT CATCTGACAG GCGATTTCAG 
1621 GATTTTCTGG ACACTTTGGA TAGGTTAGGA ACGTGTTAGC GTTCCTGTGT GAGAACTGAC 
1681 GGTTGGATGA GGAGGAGGCC ATAGCCGACG ACGGAGGTTG AGGCTGAGGG ATGGCAGACT 
1741 GGGAGCTCCA' AACTCTATAG TATACCCGTG CGCCTTCGAA ATCCGCCGCT CCATTGTCTT 
1601 ATAGTGGTTG TAAATGGGCC GGACCGGGCC GGCCCAGCAG GAAAAGAAGG CGCGCACTAA 
18 61 TATTACCG CG CCTTCTTTTC CTGCGAGGGC CCGGTAGGGA CCGAGCGCTT TGATTTAAAG 
1921 CCTGGTTCTG CTTTGCGGCC GCTCGAGCAT GCATCTAGAG GGCCCAATTC GCCCTATAGT 
1981 GAGTCGTATT ACAATTCACT GGCCGTCGTT TTACAACGTC GTGACTGGGA AAACCCTGGC 
2041 GTTACCCAAC TTAATCGCCT TG C AG CACAT CCCCCTTTCG CCAGCTGGCG ^AATAGCGAA 
2101 GAGGCCCGCA CCGATCGCCC TTCCCAACAG. TTGCGCAGCC TATACGTACG GCAGTTTAAG 
2161 GTTTACACCT ATAAAAGAGA GAGCCGTTAT CGTCTGTTTG TGGATGTACA GAGTGATATT 
2221 ATTGACACGC CGGGGCGACG GATGGTGATC CCCCTGGCCA GTGCACGTCT GCTGTCAGAT 
2281 AAAGTCTCCC GTGAACTTTA CCCGGTGGTG CATATCGGGG ATGAAAGCTG GCGCATGATG 
2341 ACCACCGATA TGGCCAGTGT GCCGGTCTCC GTTATCGGGG AAGAAGTGGC TGATCTCAGC 
24 01 CACCG CG AAA ATGACATCAA AAACGCCATT AACCTGATGT TCTGGGGAAT ATAAATGTCA 
2461 GGCCTGAATG GCGAATGGAC GCG CCCTGTA GCGGCGCATT AAGCGCG CGG GTGTGGTGGT 
2521 TACGCGCAGC GTGACCGCTA CACTTGCCAG CGCCCTAGCG CCCGCTCCTT TCGCTTTCTT 
2581 CCCTTCCTTT CTCGCCACGT TCGCCGGCTT TCCCCGTCAA GCTCTAAATC GGGGGCTCCC 
.2 641 TTTAGGGTTC CGATTTAGAG CTTTACGGCA CCTCGACCGC AAAAAACTTG ATTTGGGTGA 
2701 TGGTTCACGT AGTGGGCCAT CGCCCTGATA GACGGTTTTT CGCCCTTTGA CGTTGGAGTC 
2761 CACGTTCTTT AATAGTGGAC TCTTGTTCCA AACTGGAACA ACACTCAACC CTATCGCGGT 
2821 CTATTCTTTT GATTTATAAG GGATGTTGCC GATTTCGGCC TATTGG TTAA AAAATG AG CT 
28 81 GATTTAACAA AAATTTTAAC AAAATTCAGA AGAACTCGTC AAGAAGGCGA TAGAAGGCGA 
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Figure 4 (cont'd) 

2941 TGCGCTGCGA ATCGGGAGCG GCGATACCGT AAAGCACGAG G AAG CGGTCA GCCCATTCGC 

3 001 CGCCAAGCTC TTCAGCAATA TCACGGGTAG CCAACGCTAT GTCCTGATAG CGGXCCGCCA 

3 061 CACCCAGCCG GCCACAGTCG ATGAATCCAG AAAAGCGGCC ATTTTCCACC ATGATATTCG 

3121 GCAAGCAGGC ATCGCCATGG GTCACGACGA GATCCTCGCC GTCGGGCATG CTCGCCTTGA 

3181 GCCTGGCGAA CAGTTCGGCT GGCGCGAGCC CCTGATGCTC TTCGTCCAGA TCATCCTGAT 

3241 CGACAAGACC GGCTTCCATC CGAGTACGTG CTCGCTCGAT GCGATGTTTC GCTTGGTGGT 

33 01 CGAATGGGCA -GGTAGCCGGA TCAAG CGTAT GCAGCCGCCG CATTGCATCA GCCATGATGG 

33 61 ATACTTTCTC GGCAGGAGCA AGGTGAGATG ACAGGAGATC CTGCCCCGGC ACTTCGCCCA 

3421 ATAGCAGCCA GTCCCTTCCC GCTTCAGTGA CAACGTCGAG CACAGCTGCG CAAGGAACGC 

3481 CCGTCGTGGC CAGCCACGAT AGCCGCGCTG CCTCGTCTTG CAGTTCATTC AGGGCACCGG 

3541 ACAGGTCGGT CTTGACAAAA AGAACCGGGC GCCCCTGCGC TGACAGCCGG AACACGGCGG 

3601 CATCAGAGCA GCCGATTGTC TGTTGTGCCC AGTCATAGCC GAATAGCCTC TCCACCCAAG 

3661 CGGCCGGAGA ACCTGCGTGC AATCCATCTT GTTCAATCAT GCGAAACGAT CCTCATCCTG 

3721 TCTCTTGATC AGATCTTGAT CCCCTGCGCC ATCAGATCCT TGGCGGCGAG AAAGCCATCC 

3781 AGTTTACTTT GCAGGGCTTC CCAACCTTAC CAGAGGGCGC CCCAGCTGGC AATTCCGGTT 

3B41 CGCTTGCTGT CCATAAAACC 'GCCCAGTCTA GCTATCGCCA TCTAAGCCC^A CTGGAAGCTA 

3901 CCTCCyXTCT CTTTGCGCTT GCGTTTTCCC TTGTCCAGAT AGCCCAGTAG CTGACATTCA 

3961 TCCGGGGTCA GCACCGTTTC TGCGGACTGG CTXTCTACGT GAAAAGGATC TAGGTGAAGA 

4021 TCCTTTTTGA TAATCTCATG ACCAAAATCC CTTAACGTGA GTTTTCGTTC CACTGAGCGT 

4081 CAGACCCCGT AGAAAAGATC AAAGGATCTT CTTGAGATCC TTTTTTTCTG CGCGTAATCT 

4141 GCTGCTTGCA AACAAAAAAA CCACCGCTAC CAGCGGTGGT TTGTTTtGCCG GATCAAGAGC 

4201 TACCAACTCT TTTTCCGAAG GTAACTGGCT TCAGCAGAGC GCAGATACCA AAXACTGTCC 

4261 TTCTAGTGTA GCCGTAGTTA GGCCACCACT TCAAGAACTC TGTAGCACCG CCTACArACC 

4321 TCGCTCTGCT AATCCTGTTA CCAGTGGCTG CTGCCAGTGG CGATAAGTCG TGTCTTACCG 

43 81 GGTTGGACTC AAGACGATAG TTACCGGATA AGGCGCAGCG GTCGGGGTGA ACGGGGOGTT 

4441 CGTGCAGACA -GCCCAGCTTG GAGCGAACGA CCTACACCGA ACTGAGATAC CTACAGCGTG 

4501 AGCTATGAGA AAGCGCCACG CTTCCCGAAG GGAGAAAGGC GGACAGGTAT CCGGTAAGCG 

4561 GCAGGGTCGG AACAGGAGAG CGCACGAGGG AGCTTCCAGG GGGAAACGCC TGGTATCTTT 

4621 ATAGTCCTGT CGGGTTTCGC CACCTQTGAC TTGAGCGTCG ATTTTTGTGA TGCTCGTCAG 

4681 GGGGGCGGAG CCTATGGAAA AACGCCAGCA ACQGGGCCTT TTTACX3GTTC CTGGGCTTTT 

4741 GCTGGCCTTT TGCTCACATG TTCTTTCCTG CGTTATCCCC TGATTCTGTG GATAACCGTA 

4801 TTACCGCCTT TGAGTGAGCT GATACCGCTC GCCGCAGCCG AACGACCGAG CGCAGCGAGT 

4861 CAGTGAGCGA GGAAGCGGAA G 
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Figure 5 



pMSVLSB-2: 3413 bp; 

Composition 777 A; 950 C,- 884 G; 802 T/ 0 OTHER 
Percentage: 23% A; 28% C; 26% G,- 23% T; 0%OTHER 

Molecular Weight (kDa) : ssDNA: 1052.40 dsDNA: 2104.2 
ORIGIN 

1 AG CGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 

61 ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 

121 TCACTCATTA GG CACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 

181 TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC GCCAAGCTAT 

241 TTAGGTGACA CTATAGAATA CTCAAGCTAT GCATCAAGCT TGGGCCCGGT AGGGACCGAG 

3 01 CGCTTTGATT TAAAGCCTGG TTCTGCTTTG TATGATTTAT CTAAAGCAGC CCAATCTAAA 

36.1 GAAACCGGTC CCGGGCACTA TAAATTGCCT AACAAGTGCG ATTCATTCAT GGATCCTTTA 

.421 AACTCGAGTC TAGAGGGCCC GAATT£TGCA GATATCCATC ' ACACTGGCGG CCGCTCGAGC 

4«81 ATGCATCTAG. AGGGCGCAAT TCGCCCTATA GTGAGTCGTA TTACAATTCA CTCGCCGTCG 

541 TTTTACAACG TCGTGACTGG GAAAACCCTG GCGTTACCCA ACTTAATCGC CTTGCAGCAC 

601 ATCCCCCTTT CGCCAGCTGG CGTAATAGCG AAGAGGCCCG CACCGATCGC CCTTCCCAAC 

661 AGTTGCGCAG CCTATACGTA CGGCAGTTTA AGG1TTACAC CTATAAAAGA GAGAGCCGTT 

721 ATCGTCTGTT TGTGGATCTA CAGAGTGATA TTATTGACAC GCCGGGGCGA CGGATGGTGA 

781 TCCCCCTGGC CAGTGCACGT CTGCTGTCAG ATAAAGTCTC CCGTGAACTT TACCCGGTGG 

841 TGCATATCGG GGATG AAA GC TGGCGCATGA TGACCACCGA TATGGCCAGT GTGCCGGTCT 

901 CCGTTATCGG GGAAGAAGTG GCTGATCTCA GCCACCGCGA AAATGACATC AAAAACGCCA 

961 TTAACCTGAT GTTCTGGGGA ATATAAATGT CAGGCCTGAA TGGCGAATGG ACGCGCCCTG 

1021 TAGCGGCGCA TTAAGCG CGC GGGTGTGGTG GTTACGCGCA GCGTGACCGC TACACTTGCC 

1081 AGCGCCCTAG CGGCCGCTCC TTTCGCTTTC TTCCCTTCCT TTCTCGCCAC GTTCGCCGGC 

1141 TTTCCCCGTC AAG CTCTAAA TCGGGGGCTC CCTTTAGGGT TCCGATTTAG AGCTTTACGG 

1201 CACCTCGACC GCAAAAAACT TGATTTGGGT GATGGTTCAC G TAG TG GG CC ATCGCCCTGA 

1261 TAGACGGTTT TTCGCCCTTT GACGTTGGAG TCCACGTTCT TTAATAGTGG ACTOTTGTTC 

1321 CAAACTGGAA CAACACTCAA CCCTATCGCX3 GTCTATTCTT TTGATTTATA AGGGATGTTG 

13 81 CCGATTTCGG CCTATTGGTT AAAAAATGAG CTGATTTAAC AAAAATTTTA ACAAAATTCA 

1441 GAAGAACTCG TCAAGAAGGC GATAGAAGGC GATGCGCTGC GAATCGGGAG CGGCGATACC 

1501 GTAAAGCACG AGGAAGCGGT CAGCCCATTC GCCGCCAAGC TCTTCAGCAA TATCACGGGT 

1561 AGCCAACGCT ATGTCCTGAT AGCGGTCCGC CACACCCAGC CGGCCACAGT CGATGAATCC 

1621 AGAAAAGCGG CCATTTTCCA CCATGATATT CGGCAAGCAG GCATCGCCAT GGGTCACGAC 

1681 GAGATCCTCG CCGTCGGGCA TGCTCGCCTT GAGCCTGGCG AACAGTTCGG CTGGCGCGAG 

1741 CCCCTGATGC TCTTCGTCCA GATCATCCTG ATCGAGAAGA CCGGCTTCCA TCCGAGTACG 

1801 TGCTCGCTCG ATGCGATGTT TCGCTTGGTG GTCGAATGGG CAGG TAGCCG GATCAAGCGT 

1861 ATGCAGCCGC CGCATTGCAT CAGCCATGAT GGATAGTTTC TCGGCAGGAG CAAGGTGAGA 

1921 TGACAGGAGA TCCTGCCCCG GCACTTCGCC CAATAGCAGC CAGTCCCTTC CCGCTTCAGT 

1981 GACAACGTCG AGCACAGCTG CGCAAGGAAC GCCCGTCGTG GCCAGCCACG ATAGC^GCGC 

2041 TGCCTCGTCT TGCAGTTCAT TCAGGGCACC GGACAGGTCG GTCTTGACAA AAAGAACCGG 

2101 GCGCCCCTGC GCTGACAGCC GGAACACGGC GGCATCAGAG CAGCCGATTG TCTGTTGTGC 

2161 CCAGTCATAG CCGAATAGCC TCTCCACCCA AGCGGCCGGA GAACCTGCGT GCAATCCATC 

2221 TTGTTCAATC ATGCGAAACG ATCCTCATCC TGTCTCTTGA TCAGATCTTG ATCCCCTGCG 

2281 CCATCAGATC CTTGGCGGCG AGAAAGCCAT CCAGTTTACT TTGCAGGGCT TCCCAACCTT 

2341 ACCAGAGGGC GCCCCAGCTG GCAATTCCGG TTCGCTTGCT GTCCATAAAA CCGCCCAGTC 

24 01 TAG CTATCG C CATGTAAGCC CACTGCAAGC TACCTGCTTT CTCTTTGCGC TTGCGTTTTC 

24 61 CCTTGTCCAG ATAGCCCAGT AGCTGACATT CATCCGGGGT CAG CACCGTT TCTGCGGACT 

2521 GGCTTTCTAC GTGAAAAGGA TCTAGGTGAA GATCCTTTTT GATAATCTCA TGACCAAAAT 

2581 CCCTTAACGT GAGTTTTCGT TCCACTG AG C GTCAGACCCC GTAGAAAAGA TCAAAGGATC 

2641 TTCTTGAGAT CCTTTTTTTC TGCGCGTAAT CTGCTGCTTG CAAACAAAAA AACCACCGCT 

2 7 01 ACCAGCGGTG GTTTGTTTGC CGGATCAAGA GCTACCAACT CTTTTTCCGA AGGTAACTGG 

2761 CTTCAGCAGA GCGCAGATAC CAAATACTGT CCTTCTAGTG TAGCCGTAGT TAGGCCACCA 

2821 CTTCAAGAAC TCTG TAG CAC CGCCTACATA CCTCGCTCTG CTAATCCTGT TACCAGTGG C 

28 Bl TGCTGCCAGT GGCGATAAGT CG TGTCTTAC CGG GTTGG AC TCAAGACGAT AGTTACCGGA 
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Figure 5 (cont'd) 



2941 TAAGGCGCAG CGGTCGGGCT GAACGGGGGG TTCGTGCACA CAGCCCAGCT TGGAGCGAAC 

3 001 GACCTACACC GAACTGAGAT ACCTACAGCG TGAGCTATGA GAAAGCGCCA CGCTTCCCGA 

3 061 AGGGAGAAAG GCGGACAGGT ATCCGGTAAG CGGCAGGGTC GGAACAGGAG AGCGCACGAG 

3121 GGAGCTTCCA GGGGGAAACG CCTGGTATCT TTATAGTCCT GTCGGGTTTC GCCACCTCTG 

3181 ACTTGAGCGT CGATTTTTGT GATGCTCGTC HGGGGGGCGG AGCCTATGGA AAAACGCCAG 

3241 CAACGCGGCC TTTTTACGGT TCCTGGGCTT TTGCTGGCCT TTTGCTCACA TGTTCTTTCC 

33 01 TGCGTTATCC -CCTGATTCTG TGGATAACCG TATTACCGCC TTTGAGTGAG CTfcATACCGC 

3361 TCGCCGCAGC CGAACGACCG AGCGCAGCGA GTCAGTG AG C GAGGAAGCGG AAG 



5b/9 



WO 01/77350 



PCT/US01/11436 



Figure 6 



pMSVLSB-3 : 

pMSVIiSB2 Apa fragment inserted: 4 9 61 bp ; 
Composition 1190 A; 1276 C; 1262 G; 1233 T; 0 OTHER 
Percentage: 24% A;. 2 6% C; 25% G; 25% T; 0% OTHER 

Molecular Weight (kDa) : ssDNA: 1531.26 dsDNA: 3058.5 
ORIGIN 

1 AGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 

61 ACGACAGGTT TCCCGACTG<3 AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 

121 TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 
181 TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC GCCAAGCTAT 
241 TTAGGTGACA CTATAGAATA CTCAAGCTAT GCATCAAGCT TGGTACCGAG CTCGGATCCA 
301 CTAGTAACGG * CCGCCAGTGT GCTGGAATTC ATGGGCAGAC CCGTCTGTAC TTtAAGAGTG 
3$1 TTGGCAACCA GTAATGAAtTA AAAACTCCCG TTTTATTATA TTTGATGAAT GCTGAAAGCT 
421 TACATTAATA TGTCGTG CG A TGGCACGAAA AAACACACGC AAACAATACA GGGGGGTAGT 
481 CGGCGGGCGG CTAAGGGTGG TGCTCGGCGG GCAGAACATC GAAAAATCAA GATCTATATG 
541 AATTACACTT CCTCCGTAGG AGGAAGCACA GGGGGAGAAT ACCACTTCTG CCCCGGCGAC 
601 ATAATGTAAA TGACGCAGTT TGCCTCGAAA TACTCCAGCT GCCCTGG AG T CATTTCCTTC 
661 ATCCAATCTT CATCCGAGTT GGCGAGGATT ATTGTAGGCT TAGACTTCTT CTGCACCTTT 
721 TTCTTCTTAC CATACTTGGG GTTTACAATG AAATCCCTCT GACAGCCAAC TAACTGTTTC 
781 CAACAAGGAC AGAATTTAAA CGGAATATCA TCTACGATGT TGTAGATTGC GTCTTCGTTG 
841 TATGAAGACC AATCAACATT ATTTTGCCAG TAATTATGAA CCCCTAGGCT TCTGGCCCAA 
301 GTAGATTTTC CGG TTCTTGT TGGGCCGACG ATGTAGAGGC TCTGCTTTCT TGATCTTTCA 
961 TCTGATGACT GGATACAGAA TCCATCCATT GGAGGTCAGA AATTGCATCC TCGAGGGTAT 
1021 AACAGGTAGG TTGAAGGAGC ATGTAAGCTT CGGGACTAAC CTGGAAGATG TTAGGCTGGA 
1081 GCCAATCGTT GATTGACTCA TTACAAAGTA AATCAGGTGA GGAGGGTGGA TGAGGATTGG 
1141 TGAACTCTTC CTGAATCTCA GGAAAAAGCT TATTTGCAGA GTATTCAAAA TACTGCAATT 
12 01 TTGTGGACCA ATCAAA GGGG" AG CTCTTTCT GGATCATGGA GAGGTACTCT TCTTTGGAGG 
1261 TAG CGTG TG A AATAATGTCT CGCATTATTT CATCTTTAGA AGGCTTTTTT TCCTTTACCT 
1321 CTGAATCAGA TTTTCCTAGG AAGGGGGACT TCCTAGGAAT GAAAGTACCT CTCTCAAACA 

1381 CAGCCAGAGG TTCCTTGAGA ATGTAATCCC TCACTCTGTT AACTGACTTG GCACTCTGAA 

1441 TATTTGGGTG AAACCCATTT ATATCAAAGA ACCTTGAGTC AGATATCCTT ATCGGCTTCT 

1501 CTGGCTGAAG CAATGCATGT AAATGCAAAC TTCCATCTTT ATGTGCCTCT CGGGCACATA 

1561 GAATATATTT GGGAATCCAA CGAACGACGA GCTCCCAGAT CATCTGACAG GCGATTTCAG 

1621 GATTTTCTGG ACACTTTGGA TAGGTTAGGA ACGTGTTAGC GTTCCTGTGT GAGAACTGAC 

1681 GGTTGGATGA GGAGGAGGCC ATAGCCGACG ACGGAGGTTG AGGCTGAGGG ATGGCAGACT 

1741 GGGAGCTCCA AACTCTATAG TATACCCGTG CGCCTTCGAA ATCCGCCGCT CCATTGTCTT 

1801 ATAGTGGTTG TAAATGGGCC. GGACCGGGCC GGCCCAGCAG GAAAAGAAGG CGCGCACTAA 

1861 TATTACCGCG CCTTCTTTTC CTGCGAGGGC CCGGTAGGGA CCGAGCGCTT TGATT?AAAG 

1921 CCTGGTTCTG CTTTGTATGA TTTATCTAAA GCAGCCCAAT CTAAAGAAAG CGGTCCCGGG 

1981 CACTATAAAT TGCCTAACAA GTGCGATTCA TTCATGGATC CTTTAAACTC GAGTCTAGAG 

2041 GGCCCAATTC GCCCTATAGT GAGTCGTATT ACAATTCACT GGCCGTCGTT TTACAACGTC 

2101 GTGACTGGGA AAACCCTGGC GTTAC CCAAC TTAATCGCCT TGCAGCACAT CCCCCTTTCG 

2161 CCAGCTGGCG TAATAGCGAA GAGGCCCGCA CCGATCGCCC TTCCCAACAG TTGCGCAGCC 

2221 TATACGTACG GCAGTTTAAG GTTTACACCT ATAAAAGAGA GAGCCGTTAT CGTCTGTTTG 

22 81 TGGATGTACA GAGTGATATT ATTGACACGC CGGGG CGACG GATGGTGATC CCCCTGGCCA 

2341 GTGCACGTCT GCTGTCAGAT AAAGTCTCCC GTGAACTTTA CCCGGTGGTG CATATCGGGG 

24 01 ATGAAAGCTG GCGCATGATG ACCACCGATA TGGCCAGTGT GCCGGTCTCC GTTATCGGGG 

2461 AAGAAGTGGC TGATCTCAGC CACCG CGAAA ATGACATCAA AAACGCCATT AACCTGATGT 

2521 TCTGGGGAAT ATAAATGTCA GGCCTGAATG GCGAATGGAC GCGCCCTGTA GCGGCGCATT 

2581 AAG CG CG CGG GTGTGGTGGT TACGCGCAGC GTGACCGCTA CACTTG CCAG CGCCCTAGCG 

2641 CCCGCTCCTT TCGCTTTCTT CCCTTCCTTT CTCG CCACG T . TCGCCGG CTT TCCCCGTCAA 

2701 GCTCTAAATC GGGGGCTCCC TTTAGGGTTC CGATTTAGAG CTTTACGGCA CCTCGACCGC 

2761 AAAAAACTTG ATTTGGGTGA TGGT TCACGT AGTGGG CCAT CGCCCTGATA GACGGTTTTT 
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Figure 6 (COilft0 



CGCCCTTTGA CGTTGGAGTC CACGTTCTTT AATAGTCGAC TCTTGTTCCA AACTGGAACA 
SaScaJcc CTATCGCGGT CTATTCTTTT GATTTATAAG GGATGTTGCC GATTTCGGCC 
SSSScT GATTTAACAA AAATTTTAAC AAAATTCAGA AGAACTCGTC 

ISggcga TAGAAGGCGA tgcgctgcga atcgggagcg GCGATACCGT aaagcacgag 

^SSSSSS JSSSc CGCCAAGCTC TTCAG CAATA TCACGGGTAG CCAACGCTAT 

gtcSSS cStcSSca cacccagcco gccacagtcg atgaatccag.aaaagcggcc 

ATOATATTCG GCAAGCAGGC ATCGCCATGG GTCACGACGA GATCCTCGCC 
SS^S SJSccSgA GCCTGGCGAA CAGTOCGGCT GGCGCGAGCC CCTGATGCTC 
^SSSiS TCATC^TGAT CGACAAGACC GGCTTCCATC CGAGTACGTG CTCGCTCGAT 
SSaSS?J gSSSSSt CGAATCGGCA GGTAGCCGGA TCAAGCGTAT GCAGCCGCCG 
SSSSS ATACTTTCTC GGCAGGAGCA AGGTGAGATG ACAGGAGATC 

S^SSS ScttcGccca ATAGCAGCCA GTCCCTTCCC GCTTCAGTGA CAACGTCGAG 
SSSgc CCGTCGTCGC CAGCCACGAT AGCCGCGCTG cctcgtcitg 

SSgcaccgg acaggtcggt cttcacaaaa agaaccgggc gcccctgcgc 
S^SSSS SSSS Jatcagagca gccgaitgtc tgttgtgccc agtcatagcc 

SaS^CCTC TCCACCCAAG CGGCCGGAGA ACC?TGCX5TGC AATCX!ATCTX. gttcaatcat 
CCTCATbcTG TCTCTTGATC AGATCTTGAT CCCCTGCGCC ATCAGATCCT 

3781 2^^™ ScStcc agottacttt gcagggcttc ccaaccttac cagagggcgc 

3841 SSSSSc SSSSS SSS^ CCATAAAACC GCCCAGTCTA GCTATCGCCA 

, ! S^SSS ?J5SSSa cxtcctttct ctttgcgctt gcgttttccc ttgtotagat 
39 ™^ r^ACATTCA tccggggtca gcaccgtttc tgcggactgg ctttctacgt 
1EJ SSaStc SSSSS S?Sttga taatcxcatg accaaaatcc cttaacgtga 
SJJJSS SSSgt cagaccccgt agaaaagatc aaaggatctt cttcagatcc 

^^^^ CGCGTAATCT GCTOCTTGCA AACAAAAAAA CCACCGCTAC CAGCGOTGGT 
4201 TTTTTTTCTG CGCGXAAXCJ. ^ppn^r. nrainfflrr TCAGCAGAGC 



2821 
2881 
2941 
3001 
3061 
3121 
3181 
3241 
3301 
3361 
3421 
3481 
3541 
3601 
3661 
3721 



3901 
39.61 
4021 



4261 
4321 
4381 
4441 
45.01 
4561 
4621 
4681 
4741 
4801 
4861 



GA^AAGAGC £££££ TTTTCCGAAG GTAACTGGCT «3C 
S^SX AATACTGTCC TTCTAGTGTA GCCGTAGTTA GGCCACCACT TCAAGAACTC 
™S JSicATACC TCGCTCTGCT AATCCTGTTA CCAGTGGCTG CTGCCAGTGG 

££5S££w tgtcttaccg ggttggactc aagacgataq ttaccggata aggcg cagcg 
SSgg^SS acggggggtt cgtccacaca gcccagc™ gagcgaacga cctacaccga 

TZ^lw PTACAGCGTG AGCTATGAGA AAGCGCCACG CTTCCCGAAG GGAGAAAGGC 

Sgac^S? Sgg?aSS SSSgtcgg aacaggagag cgcacgaggg Agcttccagg 

£££SS SSaJcS ATAGTCC^T CGGGTTTCGC CAdCTCTOAC JTGAGCGTCG 
ATTTTTGTCA TGCTCGTCAG GGGGGCGGAG CCTATGGAAA AACGCCAGCA ACGCGGCCTT 

^SSgttc CTGGGCTTTT gctggccttt tgctcacatg ttctitcctg CGTTATCCCC 
JSSSg^ StaaSS ttaccgcctt tgagtgagct gataccgctc gccgcagccg 



4921 aacgaccgag cgcagcgagt cagtgagcga ggaagcggaa g 
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Figure 7 



pMSVLSB4i 6309 bp; ^ ^^t™. 

Condition 1522 A; 1620 C; 1590 G; 1577 T; OM 
Percentage.- 24% A; 26% C; 25% G; 25% T; 0%OTHER 

Molecular Weight CkD.a) : ssDNA; 1947.08 dsDNA: 3889.6 

° RIGIN AGCGCCCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 

l-L ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 

l\ x tSc^caSa GGCACCCCAG G CTTT ACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 

III TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC. GCCAAGCTAT 

HI SgSgS ££££££ ctcaagctat gcatgaagct tggtaccgag ctcggatcca 

301 ctagtcccga tctagtaaca tagatgacac cgcgcgggat aatttatcct agtttgcgcg 

III CTATATTTTG TTTTCTATCG CGTATTAAAT GTATAATTGC GGGACTCTAA TCATAAAAAC 

III CCATCTCATA AATAACGTCA TGCATTACAT GTTAATTATT ACATGCTTAA CGTAATTCAA 

481 CAGAAATTAT. ATGATAATCA TCGACAGACC GGCAACAGGA TTCAATCTTA AGAAACTTTA 

ill TTGCCAAATG TTTGAACGAT CGGGGAAATT CGCTCGAGTT AATTAAGCGG CCGCCTCAAA 

601 AAGGATCTTC ACCTAGATCC TTTTAAATTA AAAATGAAGT TTTAGCACGT GTCAGTCCTG 

HI CTCerOGGCC ACGAAGTGCA CGCAGTTGCC GGCCGGGTCG CGCAGGGCGA ACTCCCGCCC 

nil CCACGGCTGC TCGCCGATCT CGGTCATGGC CGGCCCGGAG GCGTCCCGGA AGTTCGTGGA 

III CACGACCTCC GACCACTCGG CGTACAGCTC GTCCAGGCCG CGCACCCACA CCCAGGCCAG 

HI Stcttctcc ggcaccacct GGTCCTGGAC CGCGCTGATG AACAGGGTCA CGTCGTCCCG 

lol gaccacaccg gcgaagtcgt cctccacgaa gtcccgggag aacccgagcc ggtcggtcca 

HI gaSSacc gctccggcga cgtcgcgcgc ggtgagcacc ggaacggcac .tggtcaactt 

1021 SSca^tg gccctcctca cgtgctatta ttgaagcatt tatcagggtt attgtctcat 

llll Sg^GGATAC ATATTTGAAT GTATTTAGAA AAATAAACAA ATAGG GGTTC CGCGCACATT 

IHI SccS^AA JtGCCACCTG TATGCGGTGT GAAATACCGC ACAGATGCGT A^GAAAA 

"oi TACCGCATCA GGCGAAATTG TAAACGCGGC CGCTTAATTA AGTCGACGTC .CTCTCCAAAT 

llll ga^SgIact tccttatata GAGGAAGGGT cttgcgaagg atagtgggat tgtgcgtcat 

lltl CCCTTACGTC agtggagata tcacatcaat ccacttgctt tgaagacgtg gttggaacgt 

llll cScTtStC CACGXAGCTC CTCGTGGGTG GGGGTCCATC TTTGGGACCA CTGTCGGCAG 

llll SSSSSg aacgatagcc tttccttatc gcaatgatgg catttgtagg tgccaccttc 

^ni CTTTTCTACT GTCCTTTTGA TGAAGTGACA GATAGCTGGG CAATGGAATC CGAGGAGGTT 

llll SSSaS SSStgtt gaaaagtctc aatagccctt tggtcttctg agac^tatc 

lltl StcATATTC TTGGAGTAGA CGAGAGAGTG TCGTGCTCCA CCATGTTGAC GAATfCATGG 

llll lESSSSSi SSScttta AGAGTGTTGG caaccagtaa tgaataaaaa ctcccgt^t 

mi aSatatttc atgaatgctg aaagcttaca ttaatatgtc gtgcgatggc acgaaaaaac 

llll JSSJaaac aatacagggg ggtagtcggc gggcggctaa gggtggtgct cggcgggcag 

llll SSaS aatcaagatc tatatgaatt acacttcctc cgtaggagga agcacagggg 

lltl StS cttctccccc ggcgacataa tgtaaatgac gcagtttgcc tcgaaatact 

llll ccSScCC TGGAGTCATT TCCTTCATCC AATCTTCATC CGAGTTGGCG AGGATTATTG 

llll SSgcttaga cttcttctgc acctttttct tcttaccata CTTGGGGTTT acaatgaaat 

llll ccSSSaS ScaaSaac tgtttccaac aaggacagaa tttaaacgga atatcatcta 

till cgSSSta gattgcgtct tcgttgtatg aagaccaatc aacattattt tgccagtaat 

llll StgaScc taggcitctg gcccaagtag attttccggt 1 tcttgttggg ccgacgatgt 

2281 agaggctctg ctttcttgat ctttgatctg atgactggat acagaatcca tccattggag 

llll SSS GCATCCTCGA GGGTATAACA OCXAGGrTGA AGGAGGATGT ^CPJCGGG 

, ln : arTRACCTGG AAGATGTTAG GCTGGAGCCA ATCGTTGATT GACTCATTAC AAAGTAAATC 

llll St^aSSg SSSStgag gattggtgaa ctct-tcctga atctcaggaa aaagcttatt 

Tsll tSagagtat tcaaaatact gcaattttgt ggaccaatca aaggggagct ctttctggat 

llll StSgaSS tagtcttctt tggaggtagc gtgtgaaata atgtctcgca ttatttcatc 

llll SS^S Sxttttcct ttacctctga atcagatttt cctaggaagg gggacttcct 

llll agga^aS SacSSct caaacacagc cagaggttcc ^agaatgt aatccctcac 

2761 TCTGTTAACT GACTTGGCAC TCTGAATATT TGGGTGAAAC CCATTTATAT CAAAGAACCT 

2821 TGAGTCAGAT ATCGTTATCG GCTTCTCTGG CTGAAGCAAT GCATGTAAAT G CAAACTTCC 

2 8B1 ATCTTTATGT GCCTCTCGGG CACATAGAAT ATATTTGGGA ATCCAACGAA CGACGAGCTC 
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Figure 7 (cont'd) 



CCAGATCATC TGACAGGCGA TTTCAGGATT TTCTGGACAC TTTGGATAGG TTAGGAACGT 
SSSSS Sg^gtcaga ACTGACGGTT GGATGAGGAG GAGGCCATAG CCGACGACGG 
ag^SSc TCAGGGATGG CAGACTGGGA GCTCCAAACT CTATAGTATA cccgtgcgcc 
J^gIS?cc gccgctccat tgtcttatag togttgtaaa tgggccggac cgggccggcc 

cIS^AAA AGAAGGCGCG CA CTAATATT ACCGCGCCTT CTTTTCCTGC GAGGGCCCGG 

SSSSacc gagcgct-ttg atttaaagcc tggttctgct ttgtatgatt tatctaaagc 

SSSSS AAAGAAACCG GTCCCGGGCA CTATAAATTG CCTAACAAGT G CGATTCATT 
riSSSiSS TTAAACTCGA GTCTAGAGGG cccaattcgc CCTATAGTGA GTCGTATTAC 
SSSSS CCGTCGTTTT ACAACGTCGT GACTGGGAAA ACCCTGGCGT TACCCAACTT 
AAT<SS^ CAGCACATCC CCCTTTCGCC AGCTGGCGTA ATAGCGAAGA GGCCCGCACC 
Gt^S^S? CCCAACAGTT GCGCAGCCTA TACGTACGGC AGTTTAAGGT TTACACCTAT 
aaaSaSga gcSJtatcg TCTGTTTGTG GATGTACAGA GTGATATTAT TGACACGCCG 
SS^gga ?SSatccc CCTGGCCAGT gcacgtctgc TGTCAGATAA AGTCTCCCGT 

Ittl SaacS^J SJSSca tatcggggat gaaagctcgc gcatgatgac caccgatatg 

llll SSSSSS CGGXCTCCGX XATCGGGGAA GAAGXGGC^ «^ 
3841 
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3961 
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40B1 
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gaStcaaaa acgccaxtaa cctgatgttc togggaatat AAATGTCAGG CCTGAATGGC 

SSgSSS ScCCXGTAGC GGCGCATTAA GCGCGCGGGT GTGGTGGTTA CGCGCAGCGT 
ScStS SScCAGCG CCCTAGCGCC CGCTCCTTTC GCTITCTTCC CTTCCTTTCT 
ScCAOGTTC GCCGGCTTTC CCCGTCAAGC TCTAAATCGG GGGCTCCCTT TAGGGTTCCG 
SSagagct ttacggcacc TCGACCGCAA AAAACTTGAT. TTGGGTGATG GTTCACGTAG 
4143 CCCTGATAGA CGGTTTTTCG CCCTTTGACG TTGGAGTCCA CGtXCTTTAA 

till TaSSg^Sc TTGTTCCAAA CTGGAACAAC ACTCAACCCT ATCGCGGTCT ATTCTrrXGA 
££' TTTATAAGGG ATGTTGCCGA TTTCGGCCTA TTGGTTAAAA AATGAGCTGA TTTAACAAAA 

aattcagaag aactcgtcaa gaaggcgata gaaggcgatg cgctgcgaat 
SgSg^Sc gataccgtaa agcacgagga agcggtcagc ccattcgccg ccaagctctt 
SSSatSS aSSSgcc aacgctatgt. cctgatagcg gtccgccaca cccagccggc 
ScStcgat gaatccagaa aagcggccat tttccaccat gatattcggc aagcaggcat 
Sc^tSS? Scgacgaga tcctcgccgt cgggcatgct cgccttgagc ctggcgaaca 
S5SgSS cScSagcccc tgatgctctt cgtccagatc atcctgatcg acaagaccgg 

•StSSS AGTACGTGCT CGCTCGATGC GATGTTTCGC TTGGTGGTCG "AATGGGCAGG 
SSggS ScGTATGe AGCCGCCGCA TTG CATC AG C CATGATGGAT ACTTTCTCGG 

SSScaS Stoagatgac aggagatcct gccccggcac ttcgcccaat agcagccagt 

SSSSc TTCAGTGACA ACGTCGAGCA CAGCTGCGCA AGGAACG CCC GTCGTGGCCA 
^cSgaS CCGCGCTGCC TCGTCTTGCA GTTCATTCAG GGCACCGGAC AGGTCGGTCT 

tSSSaSg SScSSSc CCCTGCGCTC ACAGCCGGAA cacggcggca tcagagcagc 
SJSg^ctI t^tgcccag tcatagccga atagcctctc cacccaagcg gccggagaac 
SSStScS tSatSSt tcaatcatgc gaaacgatcc tcatcctgtc tcttoatcag 
SSSStcc cctgcgccat cagatccttg gcggcgagaa agccatccag tttactttgc 
aSSSS SSSacca gagggcgccc cagctcgcaa ttccggttcg cttgctgtcc 
ataaSSc ccagtctagc tatcgccato taagcccact gcaagctacc tgctttctct 
J^Jc^Sc gSSSctt gtccagatag cccagtagct gacattcatc cggggtcagc 
aSSSctc cggactggct ttctacgtga aaaggatcta ggtgaagatc cttttjgata 
at?tStS? SaSSS taacgtgagt rrrcGrxccA ctgagcgtca gaccccgtag 

SSaSJSa AGGATCTTCT TGAGATCCTT TTTTTCTGCG CGTAATCTGC TGCTTGCAAA 

2aaSacc accgctacca gcggtggttt gtttgccgga tcaagagcta ccaactcttt 
aactggcttc agcagagcgc agataccaaa tactgtcctt ctagtgtagc 
SSttagg ccaccacttc aagaactctg tagcaccgcc tacatacctc gctctgctaa 
tcSSJaS JSEStoct gccagtggcg ataagtcgtg tcttaccggg ttggactcaa 
gacgISStt accggataag gcgcagcggt cgggctgaac ggggggttcg tgcacacagc 
cc^SrS SSaacgacc tacaccgaac igagatacct acagcgtgag ctatgagaaa 
gScSSc? tScgaaggg agaaaggcgg acaggtatcc ggtaagcggc agggtcggaa 
SSSScg cacgagggag cttccagggg gaaacgcctg gtatctttat agtcctgtcg 
StScSSca cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg gggcggagcc 
tSSSaaa" cgccagcaac gcggcctttt tacggttcct gggcttttgc tggccttttg 
6 3 b i cSSSStt" SSSctgcg ttatcccctg attctgtgga taaccgtatt accgcctttg 
till aSSSSgI tISgctcgc cgcagccgaa cgaccgagcg cagcgagtca gtgagcgagg 
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Figure 8 



r>MSVLSB-5: 8043 bp; 

Composition 1583 A; 1992 C; 2011 O; 2057 T; 0 OTHER 
Percentage: 25% A; 25% C; 25% G, 26% T; 0%OTHER 

Molecular Weight (kDa) : ssDNA: 2483.31 dsDNA: 4958.5 

ORIGIN cCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCAGCTGGC 

61 ACGACAGGTT TCCCGACTGG AAAGCGGGCA GTGAGCGCAA CGCAATTAAT GTGAGTTAGC 

121 TCACTCATTA GGCACCCCAG GCTTTACACT TTATGCTTCC GGCTCGTATG TTGTGTGGAA 

181 TTGTGAGCGG ATAACAATTT CACACAGGAA ACAGCTATGA CCATGATTAC GCCAAGCTAT 

241 TTAGGTGACA CTATAGAATA CTCAAGCTAT GCATCAAGCT TGGTACCGAG CTCGGATCCA 

301 CTAGTAACGG CCGCCAGTGT GCTGGAATTC ATGGGCAGAC CCGTCTGTAC TTTAAGAGTG 

361 TTGGCAAGCA GTAATGAATA AAAACTCCCG TTTTATTATA TTTGATGAAT GCTGAAAGCT 

421 TACATTAATA TGTCGTGCGA -TGGCACGAAA AAACACACGC AAACAATACA GGGGGGTAGT 

481 CGGCGGGCGG CTAAGGGTGG TGCTCGGCGG GCAGAACATC GAAAAATCAA GATCTATATG 

<i41 AATTACACTT CCTCCGTAGG AGGAAGCACA GGGGGAGAAT ACCACTTCTC CCCCGGCGAC 

lor ATAATGTAAA TGACGCAGTT TGCCTCGAAA TACTCCAGCT GCCCTGGAGT CATTTCCTTC 

661 ATCCAATCTT CATCCGAGTT GGCGAGGATT ATTGTAGGCT TAGACTTCTT CTGCACCTTT 

721 TTCTTCTTAC CATACTTGGG GTTTACAATG AAATCCCTCT GACAGCCAAC TAACTGTTTC 

781 CAACAAGGAC AGAATTTAAA CGGAATATCA TCTACGATGT TGTAGATTGC GTCTTCGITG 

841 TATGAAGACC AATCAACATT ATTTTGCCAG TAATTATGAA * CCCCTAGGCT TCTGGCCCAA 

901 GTAGATTTTC CGGTTCTTGT TGGGCCGACG ATGTAGAGGC TCTGCTTTCT TGATCTTTCA 

961 TCTGATGACT GGATACAGAA TCCATCCATT GGAGGTCAGA AATTGCATCC TCGAGGGTAT 

1021 AACAGGTAGG TTGAAGGAGC ATGTAAGCTT CGGGACTAAC CTGGAAGATG TTAGGCTGGA 

1081 GCCAATCGTT GATTGACTCA TTACAAAGTA AATCAGGTGA GGAGGGTGGA TGAGGATTGG 

1141 TGAACTCTTC CTGAATCTCA GGAAAAAGCT TATTTGCAGA GTATTCAAAA TACTGCAATT 

1201 TTGTGGACCA ATCAAAGGGG AGCTCTTTCT GGATCATGGA GAGGTACTCT TCTTTGGAGG 

12 Gl TAG CGTGTGA AATAATGTCT CGCATTATTT CATCTTTAGA AGGCTTTTTT TGCTTTACCT 

1321 CTGAATCAGA TTTTCCTAGG AAGGGGGACT TCCTAGGAAT GAAAGTAGCT CTCTCAAACA. 

1381 CAGCCAGAGG TTCCTTQAGA ATGTAATCCC TCACTCTGTT AACTGACTTG GCACTCTGAA 

1441 TATTTGGGTG AAACCCATTT ATATCAAAGA ACCTTGAGTC AGATATCCTT ATCGGCTTCT 

1501 CTGGCTGAAG CAATGCATGT AAATGCAAAC TTCCATCTTT ATGTGCCTCT CGGGCACATA 

1561 GAATATATTT GGGAATCCAA CGAACGACGA GCTCCCAGAT CATCTGACAG GCGAT^TCAG 

1621 GATTTTCTGG ACACTTTGGA TAGGTTAGGA ACGTGTTAGC GTTCCTGTGT GAGAACTGAC 

1681 GGTTGGATGA GGAGGAGGCC ATAGCCGACG ACGGAGGTTG AGGCTGAGGG ATCGCAGACT 

1741 GGGAGCTCCA AACTCTATAG TATACCCGTG CGCCTTCGAA ATCCGCCGCT CCAJ^TCTT 

1801 ATAGTGGTTG TAAATGGGCC GGACCGGGCC GGCCCAGCAG GAAAAGAAGG CGCGCACTAA 

1861 TATTACCGCG CCTTCTTTTC CTGCGAGGGC CCGGTAGGGA CCGAGCG CTT TGATTTAAAG 

1921 CCTGGTTCTG CTTTGTATGA TTTATCTAAA GCAGCCCAAT CTAAAGAAAC CGGTCCCGGG 

1981 CACTATAAAT TGCCTAACAA GTGCGATTCA TTCATGGATC CTTTAAACTC GAGTCJAGTC 

2041 CCGATCTAGT AACATAGATG ACACCGCGCG CGATAATTTA TCCTAGTTTG CGCGCTATAT 

2101 TTTGTTTTCT ATCG CGTATT AAATGTATAA TTGCGGGACT CTAATCATAA AAACCCATCT 

Z161 CATAAATAAC GTCATGCATT ACATGTTAAT TATTACATGC TTAACGTAAT TCAACAGAAA 

2221 TTATATGATA ATCATCGACA GACCGGCAAC AGGATTCAAT CTTAAGAAAC TTTATTGCCA 

2281 AATGTTTGAA CGATCGGGGA AATTCG CTCG AGTTAATTAA GCGGCCGCCT CAAAAAGGAT 

2341 CTTCACCTAG ATCCTTTTAA ATTAAAAATG AAGTTTTAGC ACGTGTCAGT CCTGCTCCTC 

2401 GGCCACGAAG TGCACGCAGT TGCCGGCCGG GTCGCGCAGG GCGAACTCCC GCCCCCACGG 

2461 CTGCTCGCCG ATCTCGGTCA TGGCCGGCCC GGAGGCGTCC . CGGAAGTTCG TGGACACGAC 

2S2X ScGA^CAC TCGGCGTACA GCTCGTCCAG GCCGCGCACC CACACCCAGG CCAGGGTGTT 

2581 GTCCGGCACC ACCTGGTCCT GGACCGCGCT GATGAACAGG GTCACGTCGT CCCGGACCAC 

2641 ACCGGCGAAG TCGTCCTCCA CGAAGTCCCG GGAGAACCCG AGCCGGTCGG TCCAGAACTC 

2701 GACCGCTCCG GCGACGTCGC GCGCGGTGAG CACCGGAACG GCACTGGTCA ACTTGGCCAT 

2761 GGTGGCCCTC CTCACGTGCT ATTATTGAAG CATTTATCAG GGTTATTGTC TCATGAGCGG 

2821 ATACATATTT GAATGTATTT AGAAAAATAA ACAAATAGGG GTTCCGCGCA CATTTCCCCG 

2881 AAAAGTGCCA CCTGTATGCG GTGTGAAATA CCGCACAGAT GCGTAAGGAG AAAATACCGC 
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Figure 8 (cont'd) 
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ATCAGGCGAA ATTG T AAA CG CGGCCGC1TA ATTAAGTCGA CGTCCTCTCC AAATGAAATG 
AaSSSa TATAGAGGAA GGGTCTTGCG AAGGATAGTG GGATTG1GCG TCATCCCTTA 

cStSgtgga gatatcacat caatccactt GCTTTGAAGA CGTGGTTGGA acgtcttctt 
SSaSa gScctcgto ggtgggggtc catctttggg accactgtcg gcagaggcat 
c^SSat AGCcmcci tatcgcaatc atggcatttg taggtgcgac cttccttttc 
JISgISSt ttgatgaagt gacagatagc tgggcaatgg aatccgagga ggtttcccga 
Sraeccrr tcitcaaaag tctcaatagc cctttggtct tctgagactg tatctttgat 

ATTCTTGGAG " TAGACGAGAG AGTGTCGTGC TCCACCATGT TGACGAATTC ATGGGCAGAC 

ScSJSSa? Staagagtg ttcgcaacca gtaatgaata aaaactcccg ™ttata 

TTTGATGAAT GCTGAAAGCT TACATTAATA TGTCGTGCGA TGGCACGAAA AAACACACGC 
AAACAATACA GGGGGGTAGT CGGCGGGCGG CTAAGGGTGG TGCTCGGCGG gcagaacatc 

Saatcaa gatctatatg aattacactt cctccgtagg aggaagcaca gggggagaat 
SSSSSc ccccggcgac ataatgtaaa tgacgcagtt tgcctcgaaa tactccagct 
gccctggagt caittccttc atccaatctt catccgagtt ggcgaggatt attgtaggct 
tagacttctt ctgcaccttt ttcttgttac catacttggg gtttacaatg aaatccctct 

GACAGCCAAC TAACTGTTTC CAACAAGGAC AGAATTTAAA CGGAATATCA ^ACGATGT 
TGTAGATTGC GTCTTCGTTG TATCAAGACC • AATCAACATT ATTTTGCCAG TAATTATGAA 
CCCCTAGGCT TCTGGCCCAA GTAGATTTTC CGGTTCTTGT TGGGCCGACG ATGTAGAGGC 
TCTGCTTTCT TGATCTTTCA TCTGATGACT GGATACAGAA TCCATCCATT GGAGGTCAGA 
AATTGCATCC TCGAGGGTAT AACAGGTAGG TTGAAGGAGC ATGTAAGCTT CGGGACTAAC 
CTGGAAGATG TTAGGCTGGA GCCAATCGTT GATTGACTCA TTACAAAGTA AATCAGGTGA 
Sagggtgga TGAGGATTGG TGAACTCTTC CTGAATCTCA GGAAAAAGCT TATTTGCAGA 
gtaScaaaa tactgcaatt ttgtggacca ATCAAAGGGG AGCTCTTTCT GGATCATGGA 
GAGGTACTCT TCTTTGGAGG TAGCGTGTGA AATAATGTCT CGCATTATTT CATCTTTAGA 

a^Stttt tcctttacct ctoaatcaga TTTTCCTAGG aagggggact tcctaggaat 
SaaSSS SScaaaca cagccagagg ttccttgaga atgtaatccc tcactctgtt 
aScS Sactctgaa tattcgggtg aaacccattt atatcaaaga acctogagtc 
agata?cctt atcggcttct ctggctgaag caatccatgt aaatgcaaac ttccatcttt 
StcSSS cSSSxa gaatatattt gggaatccaa cgaacgacga gctcccagat 
Sa?c^acag gcgatttcag gattttctgg acactttgga taggttagga acgtgttagc 
SSc?SSt gagaactgac ggttggatga ggaggaggcc atagccgacg acggaggttg 
aSSSggg atggcagact gggagctcca aactctatag tatacccgtg cgccttcgaa 
atcSSSt CCATTGTCrrr atagtggttg taaatgggcc ggaccgggcc ggcccagcag 
paaaaSSgg Scgcactaa tattaccgcg ccrrcxTtrrc ctgcgagggc ccggggtagg 
g^SSSSS SSSSaa agcctggttc tgctttgtat gatttatcta aagcagccca 
aSaaS aSSScg ggcactataa attgcctaac aagtgcgatt. cattcatgga 

JcSS^AAAC TOGAGTCTAG AGGGCCCAAT TCGCCCTATA GTGAGTCGTA TTACAATTCA 

SScSS TTTTACAACG tcgtgactgg gaaaaccctg gcgttaccca acttaatcgc 

ATCCCCCTTT CGCCAGCTGG CGTAATAGCG AAGAGGCCCG CACCGATCGC 
SSScAAC AGTTGCGCAG CCTATACGTA CGGCAGTTTA AGGTTTACAC CTATAAAAGA 
GAGAGCCGTT ATCGTCTGTT TGTGGATGTA cagagtcata ttattgacac gccggggcga 
cXKSSga ScJctcgc cagtgcacgt ctgctgtcag ataaagtctc ccgtgaactt 
SSStcg Sa^Scgg ggatgaaagc tggcgcatga tgaccaccga tatggccagt 
gtccSgS ccgttatcgg ggaagaagtc gctgatctca gccaccgcga aaatgacatc 
aaaaScca ttaacctcat gttctgggga atataaatgt caggcctgaa tggcgaatgg 
£SScc5S tagcggcgca ttaagcgcgc gggtotogtg gttacgcgca gcgtgaccgc 

SScScC AGCGCCCTAG CGCCCGCTCC TTTCGCTTTC TTCCCTTCCT TTCTCGCCAC 

GTTCGCCGGC tttccccgtc aagctctaaa tcgggggctc cctttagggt tccgatttag 
aSSacgg cacctcgacc gcaaaaaact tgatttgggt gatggttcac gtagtgggcc 
a?£Sc?ga SgSgttt ttcgcccttt gacgttcgag tccacgttct ttaatag^ 
JSStgttc caaactggaa caacactcaa ccctatcgcg gtctattctt ttgaittata 
aSgatgttg ccgatttcgg c ct attggtt aaaaaatgag ctgatttaac aaaaatttta 

ACAAAATTCA SUScG TCAAGAAGGC GATAGAAGGC OATGdGCTGC ^AATCGGGAG 

cggcgatacc gtaaagcacg aggaagOggt cagcccattc gccgccaagc tcttcagcaa 

TATCACGGGT AGCCAACGCT ATGTCCTGAT AGCGGTCCGC CACACCCAGC CGGCCACAGT 

CGATGAATCC agaaaagcgg ccattttcca ccatgatatt cggcaagcaggcatcgccat 

GGGTCACGAC GAGATCCTCG CCGTCGGGCA TGCTCGCCTT GAGCCTGGCG AACAGTTCGG 
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Figure 8 (cont'd) 



CTGGCGCGAG CCCCTGATCC TCTTCGTCCA GATCATCCTG ATCGACAAGA CCGGCTTCCA 
SSSSS TGCTCGCTCG ATGCGATGTT TCGCTTGGTG GTCGAATGGG CAGGTAGCCG 
gSSaaJSS? ATGCAGCCGC CGCATTGCAT CAGCCATGAT GGATACTTTC TCGGCAGGAG 
SSSg^a Sacaggaga tcctgccccg GCACTTCGCC caatagcagc CAGTCCCTTC 
SSSJSgt gacaacgtcg agcacagctg cgcaaggaac gcccgtcgtg gccagccacg 
*™SS2gc Scctcgtct tgcagttcat tcagggcacc ggacaggtcg gtcttgacaa 
SScccSS Stgacagcc ggaacacggc ggcatcagag cagcogattg 

TCTGTTGTGC SS£SS£ CCGAATAGCC TCTCCACCCA AGCGGCCGGA GAACCTGCGT 
SSSSSSc TTGTTCAATC ATGCGAAACG ATCCTCATCC TGTCTCTTGA TCAGATCTTG 
SSScSScG CCATCAGATC CTTGGCGGCG AGAAAGCCAT CCAGTTTACT TTGCAGGGCT 
JScScS? ACCA6AGGGC GCCCCAGCTG GCAATTCCGG TTCGCTTGCT GTCCATAAAA 
SScAGTC TAGCTATCGC CATGTAAGCC CACTGCAAGC TACCTGCTTT CTCrTCGCGC 
CCTTGTCCAG ATAGCCCAGT AGCTGACATT CATCCGGGGT CAGCACCGTT 

Jctgcggagt ggctttctac gtoaaaagga tctaggtgaa gatccttctt GATAATCTCA 

JScSaat CCCTTAACGT GAGTTTTCGT TCCACTGAGC GTCAGACCCC GTAGAAAAGA 
TCAAAGGATC TTCTTGAGA^ CCTXT3TTTC TGCGCGTAAT .TTTCCTGGTTG CAAACAAAAA 

7 3 23 akSccgct- accagcggtg Grrrcrrrrcc cggatcaaga gctaccaact ottttccga 

llll CTTCAGCAGA GCGCAGATAC CAAATACTGT CCTTCTAGTG TAGCCGTAGT 

llll ?2ScScA CTTCAAGAAC TCTGTAGCAC CGCCTACATA CCTOSCTCTG CTAATCCTGT 
?aSS?ggc TGCTGGCAGT GGCGATAAGT CGTGTCTTAC CGGGTTGGAC xcaagacgat 

aSSaccggX taaggcgcag cggtcgggct gaacgggggg ttcgtgcaca cagcccagct 
SSSaac GACCTACACC GAACTGAGAT ACCTACAGCG tgagctatga gaaagcgcca 
SS?ScSa agggagaaag gcggacaggt atccggtaag cggcagggtc ggaacaggag 

AGCGCACGAG GGAGCTTCCA GGGGGAAACG CCTGGTATCT TTATAGTCCT GTCGGGTTTC 
ScSc^SS ACTTGAGCGT CGATTTTTGT GATGCTCGTC AGGGGGGGGG AGCCTATGGA 
SScGCCAG CAACGCGGCC TTTTTACGGT TCCTGGGCTT TTGCTGGCCT TTTGCTCACA 
SJJSSc TGCGTTATCC CCTGATTCTG TGGATAACCG TATTACCGCC TTTGAGTGAG 
C^gI^C ^SSSSZc CGAACGACCG AGCGCAGCGA GTCAGTGAGC GAGGAAGCGG 
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Figure 9 



pMSVLSB-6: 7404 bp; 

Co^ition ie39 A; .794 C; 1835 O, 1936 T; 0 OTHER 
Percentage: 25% A; 24% C; 25% G; 26% T,- 0%OTHBR 

Molecular Weight (kDa) : ssDNA: 2286.33 dsDNA : 4564.5 

nrrrrrCAAT ACGCAAACCG CCTCTCCCCG CGCGTTGGCC GATTCATTAA TGCA6CTGGC 

ScaSg1£ aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 

?SSca?S SSSSg gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 
JSSgSS atScaaS? cacacaggaa acagctatga ccatgattac occaagctat 

TTAGGTGACA CTATAGAATA CTCAAGCTAT GCATCAAGCT TGGTACCGAG CTCGGATCCA 

SSaSS cSccagtgt gctggaattc atgggcagac ccgtctgtac tttaagagtg 
SSScScca gtaatgaata aaaactcccg ttttattAta tttgatgaat gctgaragct 
SSSaata tgtggtgSa tcgcacgaaa aaacagacgc aaacaataca GGGGGGTAGT 
3g?gggcgg CTAAGGGTCG TGCTCGGCGG GCAGAACATC gaaaaatcaa GATCTATATG 
SSSSt cctccgtagg aggaagcaca gggggagaat accacttctc ccccggcgac 

SUSaaI TGACGCAGTT TGCCTCGAAA TACTCCAGCT GCCCTGGAGT CATTTCGTTC 
Jt^StT CATCCGAGTT GG CG AGG ATT ATTGTAGGCT TAGACTTCTT CTGCACCTTT 
T^CTTCTTAC CATACTTGGG GTTTACAATG AAATCCCTCT GACAGCCAAC TAACTGTTTC 
SSSSc AGAATTTAAA CGGAATATCA TCTACGATGT TGTAGATTGC GTCTTCGTTG 
S^g^c Scaacatt ATTTTGCCAG TAATTATGAA CCCCTAGGCT ^TGGCCCAA 
GTAGATTTTC CGGTTCTTGT TGGGCCGACG ATGTAGAGGC TCTGCTTTCT TGATCTTTCA 

SStgaS gStacagaa tccatccatt ggaggtcaga aattgcatcc tcgagggtat 
SSggtagg ttgaaggagc atgtaagctt cgggactaac ctggaagatg ttaggctgga 
SScaat^S gattgrctca ttacaaagta aatcaggtga ggagggtgga tgaggattgg 
tSaactcttc ctgaatctca ggaaaaagct tatttgcaga gtattcaaaa .tactgcaatt 
t^Sg^cS atSaaaSgg agctctttct ggatcatgga gaggtactct tctttggagg 
tISStStS aataatgtct cgcattattt catctttaga aggctttttt tcctttacct 
SSaatcaga ttttcctagg aagggggact tcctaggaat gaaagtacct ctctcaaaca 
Sgccagagg ttccttgaga atgtaatccc tcactctgtt aactgacttg gcactctgaa 
SSSgS aaaSStS atatcaaaga accttgagtc agatatcctt atcggcttct 

CAATGCATGT AAATGCAAAC TTCCATCTTT ATGTGCCTCT CX3GGCACATA 

gSSSS gggHtccaa CGAACGACGA GCTCCCAGAT catctgacag gcgatttcag 

GATTTTCTGG ACACTTTGGA TAGGTTAGGA ACGTGTTAGC GTTCCTGTGT GAGAACTGAC 
SSSSK GGAGGAGGCC ATAGCCGACG ACGGAGGTTG AGGCTGAGGG ATGGCAGACT 
n^rnri aaTTTTATAG TATACCCGTG CGCCTTCGAA ATCCGCCGCT CCATTGTCTT 
16 03 aSStS TAAa££gC*C SSSScC GGCCCAGCAG GAAAAGAAGG ^C^CACTAA 
J!£ StTACCGCG CCTTCTTTTC CTGCX3AGGGC CCGGTAGGGA CCGAGCGCTT TGATTTAAAG 
rcTGGTTCTG StTGTATGA TTTATCTAAA GCAGCCCAAT CTAAAGAAAC CGGTCCCGGG 

192 SSataaa? 5gcSSS gtgcgattca ttcatggatc ctttaaactc gagtctagtc 

PTGATCTAGT AACATAGATG ACACCGCGCG CGATAATTTA TCCTAGTTTG CGCGCTATAT 

SSxtSS aaa^tataa ttgcgggact ctaatgataa aaacccatct 

CATAAATAAC GTCATGCATT ACATGTTAAT TATTACATGC TTAACGTAAT TCAACAGAAA 
ttatatgata atcatcgaca GACCGGCAAC AGGATTCAAT cttaagaaac TTTATTGCCA 
aItoSaa CGATCGGGGA AATTCGCTCG AGTTAATTAA GCGGCCGCTT aattaagtcg 
ACOTCCTCTC CAAATGAAAT GAACTTCCTT ATATAGAGGA AGGGTCTTGC GAAGGATAGT 

Sggattgtgc gtcatccctt acgtcagtgg agatatcaca tcaatccact tgctttgaag 
SSgg aacgtcttct ttttccacgt agctcctcgt gggtgggggt ccatctttgg 
gacSSgtI SSSgSS tcttgaacga tagcctttcc ttatcgcaat gatggcattt 
g?aggtgcca ccitcctttt ctactgtcct tttgatgaag tgacagatag ctgggcaatg 
SSSSS aggtttcccg atattaccct ^ttgaaaa gtctcaatag ccctttggtc 

TTCTGAGACT GTATCTTTGA TATTCTTGGA GTAGACGAGA GAGTGTCGTG CTCCAUUH 

Sgacgaatt catgggcaga cccgtctgta ctttaagagt gttggcaacc agtaatgaat 

JS^SJS GTTTTATTAT ATTTGATGAA TGCTGAAAGC TTACATTAAT ATGTCGTGCG 



ORIGIN 

1 

61 
121 
181 
241 
301 
361 
421 
481 
541 
601 
661 
721 
781 
841 
90% 
961 
1021 
1081 
1141 
1201 
1261 
1321 
1381 
1441 
1501 
1561 
1621 
i681 
1741 
1801 



1981 
2041 
2101 
2161 
2221 
2281 
2341 
2401 
2461 
2521 
2581 
2641 
2701 
2761 
2821 
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Figure 9 (cont'd) 



2 8 81 ATGGCACGAA AAAACACAC£ CAAACAATAC AGGGGGGTAG TCGGCGGGCG GCTAAGGGTG 
2941 GTGCTCGGCG GGCAGAACAT CGAAAAATCA AGATCTATAT GAATTACACT TCCTCCGTAG 

3 001 GAGGAAGCAC AGGGGGAGAA TACCACTTCT CCCCCGGCGA CATAATGTAA ATGACGCAGT 
3 061 TTGCCTCGAA ATACTCCAGC TG CCCTGGAG TCATTTCCTT CATCCAATCT TCATCCGAGT 
3121 TGGCGAGG AT TATTGTAGGC TTAGACTTCT TCTG CACCTT TTTCTTCTTA CCATACTTGO 
3181 GGTTTACAAT GAAATCCCTC TGACAGCCAA CTAACTGTTT CCAACAAGGA CAGAATTTAA 
3241 ACGGAATATC ATCTACGATG TTGTAGATTG CGTQTTCGTT GTATGAAGAC CAATCAACAT 
33 01 TATTTTGCCA GTAATTATGA ACCCCTAGGC TTCTGGCCCA AGTAGATTTT CCGGTTCTTG 

33 61 TTGGGCCGAC GATGTAGAGG CTCTGCTTTC TTGATCTTTC ATCTGATGAC TGGATACAGA 
3421 ATCCATCCAT TGGAGGTCAG AAATTGCATC CTCGAGGGTA TAACAGGTAG GTTGAAGGAG 

34 81 CATGTAAGCT TCGGGACTAA CCTGGAAGAT GTTAGGCTGG AG CCAATCG T TGATTGACTC 
3541 ATTACAAAGT AAATCAGGTG AGGAGGGTGG ATGAGGATTG GTGAACTCTT CCTGAATCTC 
3601 AGGAAAAAGC TTATTTGCAG AGTATTCAAA ATACTGCAAT TTTGTGGACC AATCAAAGGG 
3 661 GAGCTCTTTC TGGATCATGG AGAGGTACTC TTCTTTGGAG GTAGCGTGTG AAATAATGTC 
3721 TCGCATTATT TCATCTTTAG AAGGCTTTTT TTCCTTTACC TCTGAATCAG ATTTTCCTAG 
3-781 GAAGGGGGAC TTCCTAGGAA TGAAAGTACC TCTCTCAAAC ACAGC£AGAG GTTCCTTGAG 
3 841 AATGTAATCC CTCACTCTGT TAACTGACTT GGCACTCTGA* ATATTTGGGT GAAACCCAlT 
3901 TATATCAAAG AACCTTGAGT CAGATATCCT TATCGGCTTC TCTGGCTGAA GCAATG CA TG 
3961 TAAATGCAAA CTTCCATCTT TATGTGCCTC TCGGGCACAT AGAATATATT TGGGAATCCA 
4021 ACGAACGACG AGCTCCCAGA TCATCTGACA GGCGATTTCA GGATTTTCTG GACACTTTGG 
4081 ATAGGTTAGG AACGTGTTAG CGTTCCTGTG TGAGAACTGA CGGTTGGATG AGGAGGAGGC 
4141 CATAGCCGAC GACGGAGGTT GAGGCTGAGG GATGGCAGAC TGGGAGCTCC AAACTCTATA 
4201 GTATACCCGT GCGCCTTCGA AATCCGCCGC TCCATTGTCT TATAGTGGTT GTAAATGGGC 
4261 CGGACCGGGC CGGCCCAGCA GGAAAAGAAG GCGCGCACTA ATATTACCGC GCCTTCTTTT 
4321 CCTGCGAGGG CCCGGGGTAG GGACCGAGCG CTTTGATTTA AAGCCTGGTT CTGCTTTGTA 
4381 TGATTTATCT AAAGCAGCCC AATCTAAAGA AACCGGTCCC GGGCACTATA AATTGCCTAA 
4441 CAAGTG CGAT TCATTCATGG ATCCTTTAAA CTCGAGTCTA GAGGGCCCAA TTCGCCCTAT 
4501 AGTGAGTCGT ATTACAATTC ACTGGCCGTC GTTTTACAAC GTCGTGACTG GGAAAACCCT 
4561 GGCGTTACCC AACTTAATCG . CCTTGCAGCA CATCCCCCTT TCGCCAGCTG GCGTAATAGC 
4 621 GAAGAGGCCC GCACCGATCG CCCTTCCCAA CAGTTGCGCA GCCTATACGT ACGGCAGTTT 

4 681 AAGGTTTACA CCTATAAAAG AGAGAGCCGT TATCGTCTGT TTGTGGATGT ACAGAGTGAT 
4 741 ATTATTGACA CGCCGGGGCG ACGGATGGTG ATCCCCCTGG CCAGTGCACG TCTGCTGTCA 
4 801 GATAAAGTCT CCCGTGAACT TTACCCGGTG GTGCATATCG GGGATGAAAG CTGGCGCATG 

4 861 ATGACCACCG ATATGGCCAG TGTGCCGGTC TCCGTTATCG GGGAAGAAGT GGCTGATCTC 
4921 AGCCACCGCG AAAATGACAT CAAAAACGCC ATTAACCTGA TGTTCTGGGG AATATAAATG 
4981 TCAGGCCTGA . ATGGCGAATG GACGCGCCCT GTAGCGG CG C ATTAAGCGCG CGGGfGTGGT 

5 041 GGTTACG CGC AGCGTGACCG CTACACTTGC CAGCGCCCTA GCGCCCGCTC CTTTCGCTTT 
5101 CTTCCGTTCC TTTCTCGCCA CGTTCGCCGG CTTTCCCCGT CAAGCTCTAA ATCGGGGGCT 
5161 CCCTTTAGGG TTCCGATTTA GAGCTTTACG GCACCTCGAC CGCAAAAAAC TTGATTTGGG 
5221 TGATGGTTCA CGTAGTGGGC CATCGCCOTG ATAGACGGTT TTTCGCGCTT TGACGTTGGA 
5281 GTCCACGTTC TTTAATAGTG GACTCTTGTT CCAAACTGGA ACAACACTCA ACCCTATCGC 
5341 GGTCTATTCT TTTGATTTAT AAGGGATGTT GCCG ATTTCG GCCTATTGGT TAAAA^ATGA 
54 01 GCTGATTTAA CAAAAATTTT AACAAAATTC AGAAGAACTC GTCAAGAAGG CGATAGAAGG 
5461 CGATGCGCTG CGAATCGGGA GCGGCGATAC CGTAAAG CAC GAGGAAGCGG TCAGCCCATT 
5521 CGCCGCCAAG CTCTTC AG C A ATATCACGGG TAGCCAACGC TATGTCCTGA TAGCGGTCCG 
5 581 CCACACCCAG CCGGCCACAG TCGATGAATC CAGAAAAGCG GCCATTTTCC ACCATGATAT 
5 641 TCGGCAAGCA GGCATCGCCA TGGGTCACGA CGAGATCCTC GCCGTCGGGC ATGCTCGCCT 
5701 TGAGGCTGGC GAACAGTTCG GCTGGCGCGA GCCCCTGATG CTCTTCGTCC AGATCATCCT 
5761 GATCGACAAG ACCGGCTTCC ATCCGAGTAC GTGCTCGCTC GATGCGATGT TTCGCTTGGT 
5821 GGTCGAATGG GCAGGTAGCC GGATCAAGCG TATGCAGCCG CCGCATTGCA TCAGCCATGA 
5 8 81 TGGATACTTT CTCGG CAGG A GCAAGGTGAG ATGACAGGAG ATCCTGCCCC GGCACTTCGC 
5941 CCAATAG CAG CCAGTCCCTT CCCGCTTCAG TGACAACGTC GAGCACAGCT GCGCAAGGAA 
60 01 CGCCCGTCGT GGCCAGCCAC GATAGCCGCG CTGCCTCGTC TTGCAGTTCA TTCAGGGCAC 
6061 CGGACAGGTC GGTCTTGACA AAAAGAACCG GGCGCCCCTG CGCTGACAGC CGGAACACGG 
6121 CGGCATCAGA GCAGCCGATT GTCTGTTGTG CCCAQTCATA GCCGAATAGC CTCTCCACCC 
6181 AAGCGGCCGG AG AACCTG CG TGCAATCCAT CTTGTTCAAT CATGCGAAAC GATCCTCATC 
6241 CTGTCTCTTG ATCAGATCTT GATCCCCTGC GCCATCAGAT CCTTGGCGGG GAGAAAGCCA 
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Figure 9 (cont'd) 



63 01 TCCAGTTTAC TTTGCAGGGC TTCCCAACCT TACCAGAGGG CGCCCCAGCT GGCAATTCCG 

63 61 GTTCGCTTGC TGTCCATAAA ACCGCCCAGT CTAGCTATCG CCATG TAAGC CCACTGCAAG 
6421 CTACCTGCTT TCTCTTTGCG CTTGCGTTTT CCCTTGTCCA GATAG CCCAG TAG CTG ACAT 

64 81 TCATCCGGGG TCAGCACCGT TTCTGCGGAC TGGCTTTGTA CGTGAAAAGG ATCTAGGXGA 
6S41 AGATCCTTTT TGATAATCTC ATGACCAAAA TCCCTTAACG TGAGTTTTCG TTCCACTGAG 
6601 CGTCAGACCC CGTAGAAAAG ATCAAAGGAT CTTCTTGAGA TCCTTTTTTT CTGCGCGTAA 
6661 TCTGCTGCTT GCAAACAAAA AAACCACCGC TACCAGCGGT GGTTTGTTTG CCGGATCAAG 
6721 AGCTACCAAC TCTITTTTCCG AAGGTAACTG GCTTCAGCAG AG CG GAG ATA CCAAATACTG 
6781 TCCTTCTAGT GTAGCCGTAG TTAGGCCACC ACTTCAAGAA CTCTGTAGCA CCGCCTACAT 
6841 ACCTCGCTCT GCTAATCCTG TTACCAGTGG CTGCTGCCAG TGGCGATAAG TCGTGTCTTA 
6901 CCGGGTTOGA CTCAAGACGA TAGTTACCGG ATAAGGCGCA GCGGTCGGGC TGAACGGGGG 
6961 GTTCGTGCAC ACAGCCCAGC TTGGAGCGAA CGACCTACAC CGAACTGAGA TACCTACAGC 
7021 GTG AG CTATG AGAAAGCGCC ACGCTTCCCG AAGGGAGAAA GGCGGACAGG TATCCGGTAA 
7081 GCGGCAGGGT CGGAACAGGA GAGCGCACGA GGGAGCTTCC AGGGGGAAAC GCCTGGTATC 
7141 TTTATAGTCC TGTCGGGTTT CGCCACCTCT GACTTGAGCG TCGATTTTTG TGATGCTCGT 
7201 CAGGGGGGCG GAGCCTATGG AAAAACGCCA GCAACQCGGC CTTTTTACGG .TTCCTQGGCT 
7261 TTTGCTGGCC TTTTGCTCAC ATGTTCTTTC CTGCGTTATC CCCTGATTCT GTGGATAACC 
7321 GTATTACCGC CTTTGAGTGA GCTGATACCX3 CTCGCCGCAG CCGAACGACC GAGCGCAGCG 
7381 AGTCAGTGAG CGAGGAAGCG GAAG 
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SEQUENCE LISTING 

<110> LARGE SCALE BIOLOGY CORPORATION 

<120> COMPOSITIONS AND METHODS FOR INHIBITING 
GENE EXPRESSION 

<130> 008010177PC00 

<140> To Be Assigned 
<141> 2001-04-04 

<150> 09/545,574 
<151> 2000-04-07 

<160> 14 

<170> FastSEQ for Windows Version 3.0 

<210> 1 
<211> 27 
<212> DNA 

<213> Cauliflower mosaic virus 
<400> 1 

tttgaattcg tcaacatggt ggagcac 

<210> 2 
<211> 31 
<212> DNA 

<213> Cauliflower mosaic virus 
<400> 2 

tttgtcgacg tcctctccaa atgaaatgaa c 

<210> 3 
<211> 46 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> zeocin resistance gene 
<400> 3 

cccgtcgact taattaagcg gccgcgttta caatttcgcc tgatgc 

<210> 4 
<211> 47 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> zeocin resistance gene 
<400> 4 

cccctcgagt taattaagcg gccgcctcaa aaaggatctt cacctag 
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<210> 5 

<211> 32 

<212> DNA 

<213> Artificial Sequence 
<220> 

<223> nopaline synthase gene (nos) terminator sequence 



<210> 6 
<211> 32 
<212> DNA 

<213> Artificial Sequence 
<220> 

<223> nopauline synthase (nos) terminator sequence 
<400> 6 

tttactagtc ccgatctagt aacatagatg ac 32 

<210> 7 

<211> 29 

<212> DNA 

<213> maize 



<210> 8 
<211> 30 
<212> DNA 
<213> maize 

<400> 8 

tttttaatta acggcaaggc tcacagtttg 30 

<210> 9 
<211> 4881 
<212> DNA 
<213> Viral 



<400> 5 

tttctcgagc gaatttcccc gatcgttcaa ac 



32 



<400> 7 

tttttaatta aggtccgcct gaattctcg 



29 



<400> 9 



agcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa tgcagctggc 

acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 

tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 

ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgattac gccaagctat 

ttaggtgaca ctatagaata ctcaagctat gcatcaagct tggtaccgag ctcggatcca 

ctagtaacgg ccgccagtgt gctggaattc atgggcagac ccgtctgtac tttaagagtg 

ttggcaacca gtaatgaata aaaactcccg ttttattata tttgatgaat gctgaaagct 

tacattaata tgtcgtgcga tggcacgaaa aaacacacgc aaacaataca ggggggtagt 

cggcgggcgg ctaagggtgg tgctcggcgg gcagaacatc gaaaaatcaa gatctatatg 

aattacactt cctccgtagg aggaagcaca gggggagaat accacttctc ccccggcgac 

ataatgtaaa tgacgcagtt tgcctcgaaa tactccagct gccctggagt catttccttc 

atccaatctt catccgagtt ggcgaggatt attgtaggct tagacttctt ctgcaccttt 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
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ttcttcttac catacttggg gtttacaatg aaatccctct gacagccaac taactgtttc 780 

caacaaggac agaatttaaa cggaatatca tctacgatgt tgtagattgc gtcttcgttg 840 

tatgaagacc aatcaacatt attttgccag taattatgaa cccctaggct tctggcccaa 900 

gtagattttc cggttcttgt tgggccgacg atgtagaggc tctgctttct tgatctttca 960 

tctgatgact ggatacagaa tccatccatt ggaggtcaga aattgcatcc tcgagggtat 1020 

aacaggtagg ttgaaggagc atgtaagctt cgggactaac ctggaagatg ttaggctgga 1080 

gccaatcgtt gattgactca ttacaaagta aatcaggtga ggagggtgga tgaggattgg 1140 

tgaactcttc ctgaatctca ggaaaaagct tatttgcaga gtattcaaaa tactgcaatt 1200 

ttgtggacca atcaaagggg agctctttct ggatcatgga gaggtactct tctttggagg 1260 

tagcgtgtga aataatgtct cgcattattt catctttaga aggctttttt tcctttacct 1320 

ctgaatcaga ttttcctagg aagggggact tcctaggaat gaaagtacct ctctcaaaca 13 80 

cagccagagg ttccttgaga atgtaatccc tcactctgtt aactgacttg gcactctgaa 1440 

tatttgggtg aaacccattt atatcaaaga accttgagtc agatatcctt atcggcttct 1500 

ctggctgaag caatgcatgt aaatgcaaac ttccatcttt atgtgcctct cgggcacata 1560 

gaatatattt gggaatccaa cgaacgacga gctcccagat catctgacag gcgatttcag 1620 

gattttctgg acactttgga taggttagga acgtgttagc gttcctgtgt gagaactgac 1680 

99ttggatga ggaggaggcc atagccgacg acggaggttg aggctgaggg atggcagact 1740 

gggagctcca aactctatag tatacccgtg cgccttcgaa atccgccgct ccattgtctt 1800 

atagtggttg taaatgggcc ggaccgggcc ggcccagcag gaaaagaagg cgcgcactaa 1860 

tattaccgcg ccttcttttc ctgcgagggc ccggtaggga ccgagcgctt tgatttaaag 192 0 

cctggttctg ctttgcggcc gctcgagcat gcatctagag ggcccaattc gccctatagt 1980 

gagtcgtatt acaattcact ggccgtcgtt ttacaacgtc gtgactggga aaaccctggc 2040 

gttacccaac ttaatcgcct tgcagcacat ccccctttcg ccagctggcg taatagcgaa 2100 

gaggcccgca ccgatcgccc ttcccaacag ttgcgcagcc tatacgtacg gcagtttaag 2160 

gtttacacct ataaaagaga gagccgttat cgtctgtttg tggatgtaca gagtgatatt 2220 

attgacacgc cggggcgacg gatggtgatc cccctggcca gtgcacgtct gctgtcagat 2280 

aaagtctccc gtgaacttta cccggtggtg catatcgggg atgaaagctg gcgcatgatg 2340 

accaccgata tggccagtgt gccggtctcc gttatcgggg aagaagtggc tgatctcagc 2400 

caccgcgaaa atgacatcaa aaacgccatt aacctgatgt tctggggaat ataaatgtca 2460 

ggcctgaatg gcgaatggac gcgccctgta gcggcgcatt aagcgcgcgg gtgtggtggt 2520 

tacgcgcagc gtgaccgcta cacttgccag cgccctagcg cccgctcctt tcgctttctt 2580 

cccttccttt ctcgccacgt tcgccggctt tccccgtcaa gctctaaatc gggggctccc 2640 

tttagggttc cgatttagag ctttacggca cctcgaccgc aaaaaacttg atttgggtga 2700 

tggttcacgt agtgggccat cgccctgata gacggttttt cgccctttga cgttggagtc 2760 

cacgttcttt aatagtggac tcttgttcca aactggaaca acactcaacc ctatcgcggt 2820 

ctattctttt gatttataag ggatgttgcc gatttcggcc tattggttaa aaaatgagct 2880 

gatttaacaa aaattttaac aaaattcaga agaactcgtc aagaaggcga tagaaggcga 2940 

tgcgctgcga atcgggagcg gcgataccgt aaagcacgag gaagcggtca gcccattcgc 3000 

cgccaagctc ttcagcaata tcacgggtag ccaacgctat gtcctgatag cggtccgcca 3060 

cacccagccg gccacagtcg atgaatccag aaaagcggcc attttccacc atgatattcg 3120 

gcaagcaggc atcgccatgg gtcacgacga gatcctcgcc gtcgggcatg ctcgccttga 3180 

gcctggcgaa cagttcggct ggcgcgagcc cctgatgctc ttcgtccaga tcatcctgat 3240 

cgacaagacc ggcttccatc cgagtacgtg ctcgctcgat gcgatgtttc gcttggtggt 3300 

cgaatgggca ggtagccgga tcaagcgtat gcagccgccg cattgcatca gccatgatgg 3360 

atactttctc ggcaggagca aggtgagatg acaggagatc ctgccccggc acttcgccca 3420 

atagcagcca gtcccttccc gcttcagtga caacgtcgag cacagctgcg caaggaacgc 3480 

ccgtcgtggc cagccacgat agccgcgctg cctcgtcttg cagttcattc agggcaccgg 3540 

acaggtcggt cttgacaaaa agaaccgggc gcccctgcgc tgacagccgg aacacggcgg 3 600 

catcagagca gccgattgtc tgttgtgccc agtcatagcc gaatagcctc tccacccaag 3660 

cggccggaga acctgcgtgc aatccatctt gttcaatcat gcgaaacgat cctcatcctg 3720 

tctcttgatc agatcttgat cccctgcgcc atcagatcct tggcggcgag aaagccatcc 3780 

agtttacttt gcagggcttc ccaaccttac cagagggcgc cccagctggc aattccggtt 3840 

cgcttgctgt ccataaaacc gcccagtcta gctatcgcca tgtaagccca ctgcaagcta 3900 

cctgctttct ctttgcgctt gcgttttccc ttgtccagat agcccagtag ctgacattca 3960 

tccggggtca gcaccgtttc tgcggactgg ctttctacgt gaaaaggatc taggtgaaga 4020 

tcctttttga taatctcatg accaaaatcc cttaacgtga gttttcgttc cactgagcgt 4080 

cagaccccgt agaaaagatc aaaggatctt cttgagatcc tttttttctg cgcgtaatct 4140 
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gctgcttgca 
taccaactct 
ttctagtgta 
tcgctctgct 
ggttggactc 
cgtgcacaca 
agctatgaga 
gcagggtcgg 
atagtcctgt 
gggggcggag 
gctggccttt 
ttaccgcctt 
cagtgagcga 



aacaaaaaaa 
ttttccgaag 
gccgtagtta 
aatcctgtta 
aagacgatag 
gcccagcttg 
aagcgccacg 
aacaggagag 
cgggtttcgc 
cctatggaaa 
tgctcacatg 
tgagtgagct 
ggaagcggaa 



ccaccgctac 
gtaactggct 
ggccaccact 
ccagtggctg 
ttaccggata 
gagcgaacga 
cttcccgaag 
cgcacgaggg 
cacctctgac 
aacgccagca 
ttctttcctg 
gataccgctc 
9 



cagcggtggt 
tcagcagagc 
tcaagaactc 
ctgccagtgg 
aggcgcagcg 
cctacaccga 
ggagaaaggc 
agcttccagg 
ttgagcgtcg 
acgcggcctt 
cgttatcccc 
gccgcagccg 



ttgtttgccg 
gcagatacca 
tgtagcaccg 
cgataagtcg 
gtcgggctga 
actgagatac 
ggacaggtat 
gggaaacgcc 
atttttgtga 
tttacggttc 
tgattctgtg 
aacgaccgag 



gatcaagagc 
aatactgtcc 
cctacatacc 
tgtcttaccg 
acggggggtt 
ctacagcgtg 
ccggtaagcg 
tggtatcttt 
tgctcgtcag 
ctgggctttt 
gataaccgta 
cgcagcgagt 



4200 
4260 
4320 
4380 
4440 
4500 
4560 
4620 
4680 
4740 
4800 
4860 
4881 



<210> 10 
<211> 3413 
<212> DNA 
<213> viral 



<400 
agcgcccaat 
acgacaggtt 
tcactcatta 
ttgtgagcgg 
ttaggtgaca 
cgctttgatt 
gaaaccggtc 
aactcgagtc 
atgcatctag 
ttttacaacg 
atcccccttt 
agttgcgcag 
atcgtctgtt 
tccccctggc 
tgcatatcgg 
ccgttatcgg 
ttaacctgat 
tagcggcgca 
agcgccctag 
t ttccccgtc 
cacctcgacc 
tagacggttt 
caaactggaa 
ccgatttcgg 
gaagaactcg 
gtaaagcacg 
agccaacgct 
agaaaagcgg 
gagatcctcg 
cccctgatgc 
tgctcgctcg 
atgcagccgc 
tgacaggaga 
gacaacgtcg 
tgcctcgtct 
gcgcccctgc 
ccagtcatag 



> 10 
acgcaaaccg 
tcccgactgg 
ggcaccccag 
ataacaattt 
ctatagaata 
taaagcctgg 
ccgggcacta 
tagagggccc 
agggcccaat 
tcgtgactgg 
cgccagctgg 
cctatacgta 

tgtggatgta 

cagtgcacgt 
ggatgaaagc 
ggaagaagtg 
gttctgggga 
ttaagcgcgc 
cgcccgctcc 
aagctctaaa 
gcaaaaaact 
ttcgcccttt 
caacactcaa 
cctattggtt 
tcaagaaggc 
aggaagcggt 
atgtcctgat 
ccattttcca 
ccgtcgggca 
tcttcgtcca 
atgcgatgtt 
cgcattgcat 
tcctgccccg 
agcacagctg 
tgcagttcat 
gctgacagcc 
ccgaatagcc 



cctctccccg 
aaagcgggca 
gctttacact 
cacacaggaa 
ctcaagctat 
ttctgctttg 
taaattgcct 
gaattctgca 
t cgccctata 
gaaaaccctg 
cgtaatagcg 
cggcagttta 
cagagtgata 
ctgctgtcag 
tggcgcatga 
gctgatctca 
atataaatgt 
gggtgtggtg 
tttcgctttc 
tcgggggctc 
tgatttgggt 
gacgttggag 
ccctatcgcg 
aaaaaatgag 
gatagaaggc 
cagcccattc 
agcggtccgc 
ccatgatatt 
tgctcgcctt 
gatcatcctg 
tcgcttggtg 
cagccatgat 
gcacttcgcc 
cgcaaggaac 
tcagggcacc 
ggaacacggc 
tctccaccca 



cgcgttggcc 
gtgagcgcaa 
ttatgcttcc 
acagctatga 
gcatcaagct 
tatgatttat 
aacaagtgcg 
gatatccatc 
gtgagtcgta 
gcgttaccca 
aagaggcccg 
aggtttacac 
ttattgacac 
ataaagtctc 
tgaccaccga 
gccaccgcga 
caggcctgaa 
gttacgcgca 
ttcccttcct 
cctttagggt 
gatggttcac 
tccacgttct 
gtctattctt 
ctgatttaac 
gatgcgctgc 
gccgccaagc 
cacacccagc 
cggcaagcag 
gagcctggcg 
atcgacaaga 
gtcgaatggg 
ggatactttc 
caatagcagc 
gcccgtcgtg 
ggacaggtcg 
ggcatcagag 
agcggccgga 



gattcattaa 
cgcaattaat 
ggctcgtatg 
ccatgattac 

tgggcccggt 

ctaaagcagc 
attcattcat 
acactggcgg 
ttacaattca 
acttaatcgc 
caccgatcgc 
ctataaaaga 
gccggggcga 
ccgtgaactt 
tatggccagt 
aaatgacatc 
tggcgaatgg 
gcgtgaccgc 
ttctcgccac 
tccgatttag 
gtagtgggcc 
ttaatagtgg 
ttgatttata 
aaaaatttta 
gaatcgggag 
tcttcagcaa 
cggccacagt 
gcatcgccat 
aacagttcgg 
ccggcttcca 
caggtagccg 
tcggcaggag 
cagtcccttc 
gccagccacg 
gtcttgacaa 
cagccgattg 
gaacctgcgt 



tgcagctggc 
gtgagttagc 
ttgtgtggaa 
gccaagctat 
agggaccgag 
ccaatctaaa 
ggatccttta 
ccgctcgagc 
ctggccgtcg 
cttgcagcac 
ccttcccaac 
gagagccgtt 
cggatggtga 
tacccggtgg 
gtgccggtct 
aaaaacgcca 
acgcgccctg 
tacacttgcc 
gttcgccggc 
agctttacgg 
atcgccctga 
actcttgttc 
agggatgttg 
acaaaattca 
cggcgatacc 
tatcacgggt 
cgatgaatcc 
gggtcacgac 
ctggcgcgag 
tccgagtacg 
gatcaagcgt 
caaggtgaga 
ccgcttcagt 
atagccgcgc 
aaagaaccgg 
tctgttgtgc 
gcaatccatc 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
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ttgttcaatc 
ccatcagatc 
accagagggc 
tagctatcgc 
ccttgtccag 
ggctttctac 
cccttaacgt 
ttcttgagat 
accagcggtg 
cttcagcaga 
cttcaagaac 
tgctgccagt 
taaggcgcag 
gacctacacc 
agggagaaag 
ggagcttcca 
acttgagcgt 
caacgcggcc 
tgcgttatcc 
tcgccgcagc 



atgcgaaacg 
cttggcggcg 
gccccagctg 
catgtaagcc 
atagcccagt 
gtgaaaagga 
gagttttcgt 
cctttttttc 
gtttgtttgc 
gcgcagatac 
tctgtagcac 
ggcgataagt 
cggtcgggct 
gaactgagat 
gcggacaggt 

gggggaaacg 

cgatttttgt 
tttttacggt 
cctgattctg 
cgaacgaccg 



atcctcatcc 
agaaagccat 
gcaattccgg 
cactgcaagc 
agctgacatt 
tctaggtgaa 
tccactgagc 
tgcgcgtaat 
cggatcaaga 
caaatactgt 
cgcctacata 
cgtgtcttac 
gaacgggggg 
acctacagcg 
atccggtaag 
cctggtatct 
gatgctcgtc 
tcctgggctt 
tggataaccg 
agcgcagcga 



tgtctcttga 
ccagtttact 
ttcgcttgct 
tacctgcttt 
catccggggt 
gatccttttt 
gtcagacccc 
ctgctgcttg 
gctaccaact 
ccttctagtg 
cctcgctctg 
cgggttggac 
ttcgtgcaca 
tgagctatga 
cggcagggtc 
ttatagtcct 
aggggggcgg 
ttgctggcct 
tattaccgcc 
gtcagtgagc 



tcagatcttg 
ttgcagggct 
gtccataaaa 
ctctttgcgc 
cagcaccgtt 
gataatctca 
gtagaaaaga 
caaacaaaaa 
ctttttccga 
tagccgtagt 
ctaatcctgt 
tcaagacgat 
cagcccagct 
gaaagcgcca 
ggaacaggag 
gtcgggtttc 
agcctatgga 
tttgctcaca 
tttgagtgag 
gaggaagcgg 



atcccctgcg 
tcccaacctt 
ccgcccagtc 
ttgcgttttc 
tctgcggact 
tgaccaaaat 
tcaaaggatc 
aaccaccgct 
aggtaactgg 
taggccacca 
taccagtggc 
agttaccgga 
tggagcgaac 
cgcttcccga 
agcgcacgag 
gccacctctg 
aaaacgccag 
tgttctttcc 
ctgataccgc 
aag 



2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
3000 
3060 
3120 
3180 
3240 
3300 
3360 
3413 



<210> 11 
<211> 4961 
<212> DNA 
<213> Viral 



<400 
agcgcccaat 
acgacaggtt 
tcactcatta 
ttgtgagcgg 
ttaggtgaca 
ctagtaacgg 
ttggcaacca 
tacattaata 
cggcgggcgg 
aattacactt 
ataatgtaaa 
atccaatctt 
ttcttcttac 
caacaaggac 
tatgaagacc 
gtagattttc 
tctgatgact 
aacaggtagg 
gccaatcgtt 
tgaactcttc 
ttgtggacca 
tagcgtgtga 
ctgaatcaga 
cagccagagg 
tatttgggtg 
ctggctgaag 
gaatatattt 
gattttctgg 
ggttggatga 
gggagctcca 



> 11 
acgcaaaccg 
tcccgactgg 
ggcaccccag 
ataacaattt 
ctatagaata 
ccgccagtgt 
gtaatgaata 
tgtcgtgcga 
ctaagggtgg 
cctccgtagg 
tgacgcagtt 
catccgagtt 
catacttggg 
agaatttaaa 
aatcaacatt 
cggttcttgt 
ggatacagaa 
ttgaaggagc 
gattgactca 
ctgaatctca 
atcaaagggg 
aataatgtct 
ttttcctagg 
ttccttgaga 
aaacccattt 
caatgcatgt 
gggaatccaa 
acactttgga 
ggaggaggcc 
aactctatag 



cctctccccg 
aaagcgggca 
gctttacact 
cacacaggaa 
ctcaagctat 
gctggaattc 
aaaactcccg 
tggcacgaaa 
tgctcggcgg 
aggaagcaca 
tgcctcgaaa 
ggcgaggatt 
gtttacaatg 
cggaatatca 
attttgccag 
tgggccgacg 
tccatccatt 
atgtaagctt 
ttacaaagta 
ggaaaaagct 
agctctttct 
cgcattattt 
aagggggact 
atgtaatccc 
atatcaaaga 
aaatgcaaac 
cgaacgacga 
taggttagga 
atagccgacg 
tatacccgtg 



cgcgttggcc 
gtgagcgcaa 
ttatgcttcc 
acagctatga 
gcatcaagct 
atgggcagac 
ttttattata 
aaacacacgc 
gcagaacatc 
gggggagaat 
tactccagct 
attgtaggct 
aaatccctct 
tctacgatgt 
taattatgaa 
atgtagaggc 
ggaggtcaga 
cgggactaac 
aatcaggtga 
tatttgcaga 
ggatcatgga 
catctttaga 
tcctaggaat 
tcactctgtt 
accttgagtc 
ttccatcttt 
gctcccagat 
acgtgttagc 
acggaggttg 
cgccttcgaa 



gattcattaa 
cgcaattaat 
ggctcgtatg 
ccatgattac 
tggtaccgag 
ccgtctgtac 
tttgatgaat 
aaacaataca 
gaaaaatcaa 
accacttctc 
gccctggagt 
tagacttctt 
gacagccaac 
tgtagattgc 
cccctaggct 
tctgctttct 
aattgcatcc 
ctggaagatg 
ggagggtgga 
gtattcaaaa 
gaggtactct 
aggctttttt 
gaaagtacct 
aactgacttg 
agatatcctt 
atgtgcctct 
catctgacag 
gttcctgtgt 
aggctgaggg 
atccgccgct 



tgcagctggc 
gtgagttagc 
ttgtgtggaa 
gccaagctat 
ctcggatcca 
tttaagagtg 
gctgaaagct 

ggggggtagt 

gatctatatg 
ccccggcgac 
catttccttc 
ctgcaccttt 
taactgtttc 
gtcttcgttg 
tctggcccaa 
tgatctttca 
tcgagggtat 
ttaggctgga 
tgaggattgg 
tactgcaatt 
tctttggagg 
tcctttacct 
ctctcaaaca 
gcactctgaa 
atcggcttct 
cgggcacata 
gcgatttcag 
gagaactgac 
atggcagact 
ccattgtctt 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
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atagtggttg taaatgggcc ggaccgggcc ggcccagcag gaaaagaagg cgcgcactaa 1860 

tattaccgcg ccttcttttc ctgcgagggc ccggtaggga ccgagcgctt tgatttaaag 1920 

cctggttctg ctttgtatga tttatctaaa gcagcccaat ctaaagaaac cggtcccggg 1980 

cactataaat tgcctaacaa gtgcgattca ttcatggatc ctttaaactc gagtctagag 2040 

ggcccaattc gccctatagt gagtcgtatt acaattcact ggccgtcgtt ttacaacgtc 2100 

gtgactggga aaaccctggc gttacccaac ttaatcgcct tgcagcacat ccccctttcg 2160 

ccagctggcg taatagcgaa gaggcccgca ccgatcgccc ttcccaacag ttgcgcagcc 2220 

tatacgtacg gcagtttaag gtttacacct ataaaagaga gagccgttat cgtctgtttg 2280 

tggatgtaca gagtgatatt attgacacgc cggggcgacg gatggtgatc cccctggcca 2340 

gtgcacgtct gctgtcagat aaagtctccc gtgaacttta cccggtggtg catatcgggg 2400 

atgaaagctg gcgcatgatg accaccgata tggccagtgt gccggtctcc gttatcgggg 2460 

aagaagtggc tgatctcagc caccgcgaaa atgacatcaa aaacgccatt aacctgatgt 2520 

tctggggaat ataaatgtca ggcctgaatg gcgaatggac gcgccctgta gcggcgcatt 2580 

aagcgcgcgg gtgtggtggt tacgcgcagc gtgaccgcta cacttgccag cgccctagcg 2 640 

cccgctcctt tcgctttctt cccttccttt ctcgccacgt tcgccggctt tccccgtcaa 2700 

gctctaaatc gggggctccc tttagggttc cgatttagag ctttacggca cctcgaccgc 2760 

aaaaaacttg atttgggtga tggttcacgt agtgggccat cgccctgata gacggttttt 2820 

cgccctttga cgttggagtc cacgttcttt aatagtggac tcttgttcca aactggaaca 2880 

acactcaacc ctatcgcggt ctattctttt gatttataag ggatgttgcc gatttcggcc 2940 

tattggttaa aaaatgagct gatttaacaa aaattttaac aaaattcaga agaactcgtc 3000 

aagaaggcga tagaaggcga tgcgctgcga atcgggagcg gcgataccgt aaagcacgag 3060 

gaagcggtca gcccattcgc cgccaagctc ttcagcaata tcacgggtag ccaacgctat 3120 

gtcctgatag cggtccgcca cacccagccg gccacagtcg atgaatccag aaaagcggcc 3180 

attttccacc atgatattcg gcaagcaggc atcgccatgg gtcacgacga gatcctcgcc 3240 

gtcgggcatg ctcgccttga gcctggcgaa cagttcggct ggcgcgagcc cctgatgctc 3300 

ttcgtccaga tcatcctgat cgacaagacc ggcttccatc cgagtacgtg ctcgctcgat 3360 

gcgatgtttc gcttggtggt cgaatgggca ggtagccgga tcaagcgtat gcagccgccg 3420 

cattgcatca gccatgatgg atactttctc ggcaggagca aggtgagatg acaggagatc 3480 

ctgccccggc acttcgccca atagcagcca gtcccttccc gcttcagtga caacgtcgag 3540 

cacagctgcg caaggaacgc ccgtcgtggc cagccacgat agccgcgctg cctcgtcttg 3600 

cagttcattc agggcaccgg acaggtcggt cttgacaaaa agaaccgggc gcccctgcgc 3660 

tgacagccgg aacacggcgg catcagagca gccgattgtc tgttgtgccc agtcatagcc 3720 

gaatagcctc tccacccaag cggccggaga acctgcgtgc aatccatctt gttcaatcat 3780 

gcgaaacgat cctcatcctg tctcttgatc agatcttgat cccctgcgcc atcagatcct 3840 

tggcggcgag aaagccatcc agtttacttt gcagggcttc ccaaccttac cagagggcgc 3900 

cccagctggc aattccggtt cgcttgctgt ccataaaacc gcccagtcta gctatcgcca 3960 

tgtaagccca ctgcaagcta cctgctttct ctttgcgctt gcgttttccc ttgtccagat 4020 

agcccagtag ctgacattca tccggggtca gcaccgtttc tgcggactgg ctttctacgt 4080 

gaaaaggatc taggtgaaga tcctttttga taatctcatg accaaaatcc cttaacgtga 4140 

gttttcgttc cactgagcgt cagaccccgt agaaaagatc aaaggatctt cttgagatcc 4200 

tttttttctg cgcgtaatct gctgcttgca aacaaaaaaa ccaccgctac cagcggtggt 4260 

ttgtttgccg gatcaagagc taccaactct ttttccgaag gtaactggct tcagcagagc 4320 

gcagatacca aatactgtcc ttctagtgta gccgtagtta ggccaccact tcaagaactc 4380 

tgtagcaccg cctacatacc tcgctctgct aatcctgtta ccagtggctg ctgccagtgg 4440 

cgataagtcg tgtcttaccg ggttggactc aagacgatag ttaccggata aggcgcagcg 4500 

gtcgggctga acggggggtt cgtgcacaca gcccagcttg gagcgaacga cctacaccga 4560 

actgagatac ctacagcgtg agctatgaga aagcgccacg cttcccgaag ggagaaaggc 462 0 

ggacaggtat ccggtaagcg gcagggtcgg aacaggagag cgcacgaggg agcttccagg 4 68 0 

gggaaacgcc tggtatcttt atagtcctgt cgggtttcgc cacctctgac ttgagcgtcg 4740 

atttttgtga tgctcgtcag gggggcggag cctatggaaa aacgccagca acgcggcctt 4800 

tttacggttc ctgggctttt gctggccttt tgctcacatg ttctttcctg cgttatcccc 4860 

tgattctgtg gataaccgta ttaccgcctt tgagtgagct gataccgctc gccgcagccg 4920 

aacgaccgag cgcagcgagt cagtgagcga ggaagcggaa g 4961 

<210> 12 
<211> 6309 
<212> DNA 
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<213> Viral 



<400> 12 

agcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa tgcagctggc 60 

acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 120 

tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 180 

ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgattac gccaagctat 240 

ttaggtgaca ctatagaata ctcaagctat gcatcaagct tggtaccgag ctcggatcca 300 

ctagtcccga tctagtaaca tagatgacac cgcgcgcgat aatttatcct agtttgcgcg 360 

ctatattttg ttttctatcg cgtattaaat gtataattgc gggactctaa tcataaaaac 420 

ccatctcata aataacgtca tgcattacat gttaattatt acatgcttaa cgtaattcaa 480 

cagaaattat atgataatca tcgacagacc ggcaacagga ttcaatctta agaaacttta 540 

ttgccaaatg tttgaacgat cggggaaatt cgctcgagtt aattaagcgg ccgcctcaaa 600 

aaggatcttc acctagatcc ttttaaatta aaaatgaagt tttagcacgt gtcagtcctg 660 

ctcctcggcc acgaagtgca cgcagttgcc ggccgggtcg cgcagggcga actcccgccc 72 0 

ccacggctgc tcgccgatct cggtcatggc cggcccggag gcgtcccgga agttcgtgga 780 

cacgacctcc gaccactcgg cgtacagctc gtccaggccg cgcacccaca cccaggccag 840 

ggtgttgtcc ggcaccacct ggtcctggac cgcgctgatg aacagggtca cgtcgtcccg 900 

gaccacaccg gcgaagtcgt cctccacgaa gtcccgggag aacccgagcc ggtcggtcca 960 

gaactcgacc gctccggcga cgtcgcgcgc ggtgagcacc ggaacggcac tggtcaactt 1020 

ggccatggtg gccctcctca cgtgctatta ttgaagcatt tatcagggtt attgtctcat 1080 

gagcggatac atatttgaat gtatttagaa aaataaacaa ataggggttc cgcgcacatt 1140 

tccccgaaaa gtgccacctg tatgcggtgt gaaataccgc acagatgcgt aaggagaaaa 12 00 

taccgcatca ggcgaaattg taaacgcggc cgcttaatta agtcgacgtc ctctccaaat 1260 

gaaatgaact tccttatata gaggaagggt cttgcgaagg atagtgggat tgtgcgtcat 1320 

cccttacgtc agtggagata tcacatcaat ccacttgctt tgaagacgtg gttggaacgt 1380 

cttctttttc cacgtagctc ctcgtgggtg ggggtccatc tttgggacca ctgtcggcag 1440 

aggcatcttg aacgatagcc tttccttatc gcaatgatgg catttgtagg tgccaccttc 1500 

cttttctact gtccttttga tgaagtgaca gatagctggg caatggaatc cgaggaggtt 1560 

tcccgatatt accctttgtt gaaaagtctc aatagccctt tggtcttctg agactgtatc 1620 

tttgatattc ttggagtaga cgagagagtg tcgtgctcca ccatgttgac gaattcatgg 1680 

gcagacccgt ctgtacttta agagtgttgg caaccagtaa tgaataaaaa ctcccgtttt 1740 

attatatttg atgaatgctg aaagcttaca ttaatatgtc gtgcgatggc acgaaaaaac 1800 

acacgcaaac aatacagggg ggtagtcggc gggcggctaa gggtggtgct cggcgggcag 1860 

aacatcgaaa aatcaagatc tatatgaatt acacttcctc cgtaggagga agcacagggg 192 0 

gagaatacca cttctccccc ggcgacataa tgtaaatgac gcagtttgcc tcgaaatact 1980 

ccagctgccc tggagtcatt tccttcatcc aatcttcatc cgagttggcg aggattattg 2040 

taggcttaga cttcttctgc acctttttct tcttaccata cttggggttt acaatgaaat 2100 

ccctctgaca gccaactaac tgtttccaac aaggacagaa tttaaacgga atatcatcta 2160 

cgatgttgta gattgcgtct tcgttgtatg aagaccaatc aacattattt tgccagtaat 2220 

tatgaacccc taggcttctg gcccaagtag attttccggt tcttgttggg ccgacgatgt 2280 

agaggctctg ctttcttgat ctttcatctg atgactggat acagaatcca tccattggag 2340 

gtcagaaatt gcatcctcga gggtataaca ggtaggttga aggagcatgt aagcttcggg 2400 

actaacctgg aagatgttag gctggagcca atcgttgatt gactcattac aaagtaaatc 2460 

aggtgaggag ggtggatgag gattggtgaa ctcttcctga atctcaggaa aaagcttatt 2520 

tgcagagtat tcaaaatact gcaattttgt ggaccaatca aaggggagct ctttctggat 2580 

catggagagg tactcttctt tggaggtagc gtgtgaaata atgtctcgca ttatttcatc 2640 

tttagaaggc tttttttcct ttacctctga atcagatttt cctaggaagg gggacttcct 2700 

aggaatgaaa gtacctctct caaacacagc cagaggttcc ttgagaatgt aatccctcac 2760 

tctgttaact gacttggcac tctgaatatt tgggtgaaac ccatttatat caaagaacct 2820 

tgagtcagat atccttatcg gcttctctgg ctgaagcaat gcatgtaaat gcaaacttcc 2880 

atctttatgt gcctctcggg cacatagaat atatttggga atccaacgaa cgacgagctc 2940 

ccagatcatc tgacaggcga tttcaggatt ttctggacac tttggatagg ttaggaacgt 3000 

gttagcgttc ctgtgtgaga actgacggtt ggatgaggag gaggccatag ccgacgacgg 3060 

aggttgaggc tgagggatgg cagactggga gctccaaact ctatagtata cccgtgcgcc 3120 

ttcgaaatcc gccgctccat tgtcttatag tggttgtaaa tgggccggac cgggccggcc 3180 

cagcaggaaa agaaggcgcg cactaatatt accgcgcctt cttttcctgc gagggcccgg 3240 
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ggtagggacc gagcgctttg atttaaagcc tggttctgct ttgtatgatt tatctaaagc 3300 

agcccaatct aaagaaaccg gtcccgggca ctataaattg cctaacaagt gcgattcatt 3360 

catggatcct ttaaactcga gtctagaggg cccaattcgc cctatagtga gtcgtattac 3420 

aattcactgg ccgtcgtttt acaacgtcgt gactgggaaa accctggcgt tacccaactt 3480 

aatcgccttg cagcacatcc ccctttcgcc agctggcgta atagcgaaga ggcccgcacc 3540 

gatcgccctt cccaacagtt gcgcagccta tacgtacggc agtttaaggt ttacacctat 3600 

aaaagagaga gccgttatcg tctgtttgtg gatgtacaga gtgatattat tgacacgccg 3 660 

gggcgacgga tggtgatccc cctggccagt gcacgtctgc tgtcagataa agtctcccgt 3720 

gaactttacc cggtggtgca tatcggggat gaaagctggc gcatgatgac caccgatatg 3780 

gccagtgtgc cggtctccgt tatcggggaa gaagtggctg atctcagcca ccgcgaaaat 3 840 

gacatcaaaa acgccattaa cctgatgttc tggggaatat aaatgtcagg cctgaatggc 3900 

gaatggacgc gccctgtagc ggcgcattaa gcgcgcgggt gtggtggtta cgcgcagcgt 3 960 

gaccgctaca cttgccagcg ccctagcgcc cgctcctttc gctttcttcc cttcctttct 4020 

cgccacgttc gccggctttc cccgtcaagc tctaaatcgg gggctccctt tagggttccg 4080 

atttagagct ttacggcacc tcgaccgcaa aaaacttgat ttgggtgatg gttcacgtag 4140 

tgggccatcg ccctgataga cggtttttcg ccctttgacg ttggagtcca cgttctttaa 4200 

tagtggactc ttgttccaaa ctggaacaac actcaaccct atcgcggtct attcttttga 4260 

tttataaggg atgttgccga tttcggccta ttggttaaaa aatgagctga tttaacaaaa 4320 

attttaacaa aattcagaag aactcgtcaa gaaggcgata gaaggcgatg cgctgcgaat 4380 

cgggagcggc gataccgtaa agcacgagga agcggtcagc ccattcgccg ccaagctctt 4440 

cagcaatatc acgggtagcc aacgctatgt cctgatagcg gtccgccaca cccagccggc 4500 

cacagtcgat gaatccagaa aagcggccat tttccaccat gatattcggc aagcaggcat 4560 

cgccatgggt cacgacgaga tcctcgccgt cgggcatgct cgccttgagc ctggcgaaca 4620 

gttcggctgg cgcgagcccc tgatgctctt cgtccagatc atcctgatcg acaagaccgg 4680 

cttccatccg agtacgtgct cgctcgatgc gatgtttcgc ttggtggtcg aatgggcagg 4740 

tagccggatc aagcgtatgc agccgccgca ttgcatcagc catgatggat actttctcgg 4800 

caggagcaag gtgagatgac aggagatcct gccccggcac ttcgcccaat agcagccagt 4860 

cccttcccgc ttcagtgaca acgtcgagca cagctgcgca aggaacgccc gtcgtggcca 4920 

gccacgatag ccgcgctgcc tcgtcttgca gttcattcag ggcaccggac aggtcggtct 4980 

tgacaaaaag aaccgggcgc ccctgcgctg acagccggaa cacggcggca tcagagcagc 5040 

cgattgtctg ttgtgcccag tcatagccga atagcctctc cacccaagcg gccggagaac 5100 

ctgcgtgcaa tccatcttgt tcaatcatgc gaaacgatcc tcatcctgtc tcttgatcag 5160 

atcttgatcc cctgcgccat cagatccttg gcggcgagaa agccatccag tttactttgc 5220 

agggcttccc aaccttacca gagggcgccc cagctggcaa ttccggttcg cttgctgtcc 5280 

ataaaaccgc ccagtctagc tatcgccatg taagcccact gcaagctacc tgctttctct 5340 

ttgcgcttgc gttttccctt gtccagatag cccagtagct gacattcatc cggggtcagc 5400 

accgtttctg cggactggct ttctacgtga aaaggatcta ggtgaagatc ctttttgata 5460 

atctcatgac caaaatccct taacgtgagt tttcgttcca ctgagcgtca gaccccgtag 5520 

aaaagatcaa aggatcttct tgagatcctt tttttctgcg cgtaatctgc tgcttgcaaa 5580 

caaaaaaacc accgctacca gcggtggttt gtttgccgga tcaagagcta ccaactcttt 5640 

ttccgaaggt aactggcttc agcagagcgc agataccaaa tactgtcctt ctagtgtagc 5700 

cgtagttagg ccaccacttc aagaactctg tagcaccgcc tacatacctc gctctgctaa 5760 

tcctgttacc agtggctgct gccagtggcg ataagtcgtg tcttaccggg ttggactcaa 5820 

gacgatagtt accggataag gcgcagcggt cgggctgaac ggggggttcg tgcacacagc 5880 

ccagcttgga gcgaacgacc tacaccgaac tgagatacct acagcgtgag ctatgagaaa 5940 

gcgccacgct tcccgaaggg agaaaggcgg acaggtatcc ggtaagcggc agggtcggaa 6000 

caggagagcg cacgagggag cttccagggg gaaacgcctg gtatctttat agtcctgtcg 6060 

ggtttcgcca cctctgactt gagcgtcgat ttttgtgatg ctcgtcaggg gggcggagcc 6120 

tatggaaaaa cgccagcaac gcggcctttt tacggttcct gggcttttgc tggccttttg 6180 

ctcacatgtt ctttcctgcg ttatcccctg attctgtgga taaccgtatt accgcctttg 6240 

agtgagctga taccgctcgc cgcagccgaa cgaccgagcg cagcgagtca gtgagcgagg 6300 

aagcggaag 6309 



<210> 13 
<211> 8043 
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<212> DNA 
<213> Viral 



<400> 13 

agcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa tgcagctggc 60 

acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 12 0 

tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 180 

ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgattac gccaagctat 240 

ttaggtgaca ctatagaata ctcaagctat gcatcaagct tggtaccgag ctcggatcca 300 

ctagtaacgg ccgccagtgt gctggaattc atgggcagac ccgtctgtac tttaagagtg 360 

ttggcaacca gtaatgaata aaaactcccg ttttattata tttgatgaat gctgaaagct 420 

tacattaata tgtcgtgcga tggcacgaaa aaacacacgc aaacaataca ggggggtagt 4 80 

cggcgggcgg ctaagggtgg tgctcggcgg gcagaacatc gaaaaatcaa gatctatatg 540 

aattacactt cctccgtagg aggaagcaca gggggagaat accacttctc ccccggcgac 600 

ataatgtaaa tgacgcagtt tgcctcgaaa tactccagct gccctggagt catttccttc 660 

atccaatctt catccgagtt ggcgaggatt attgtaggct tagacttctt ctgcaccttt 720 

ttcttcttac catacttggg gtttacaatg aaatccctct gacagccaac taactgtttc 780 

caacaaggac agaatttaaa cggaatatca tctacgatgt tgtagattgc gtcttcgttg 840 

tatgaagacc aatcaacatt attttgccag taattatgaa cccctaggct tctggcccaa 900 

gtagattttc cggttcttgt tgggccgacg atgtagaggc tctgctttct tgatctttca 960 

tctgatgact ggatacagaa tccatccatt ggaggtcaga aattgcatcc tcgagggtat 1020 

aacaggtagg ttgaaggagc atgtaagctt cgggactaac ctggaagatg ttaggctgga 1080 

gccaatcgtt gattgactca ttacaaagta aatcaggtga ggagggtgga tgaggattgg 1140 

tgaactcttc ctgaatctca ggaaaaagct tatttgcaga gtattcaaaa tactgcaatt 1200 

ttgtggacca atcaaagggg agctctttct ggatcatgga gaggtactct tctttggagg 12 60 

tagcgtgtga aataatgtct cgcattattt catctttaga aggctttttt tcctttacct 1320 

ctgaatcaga ttttcctagg aagggggact tcctaggaat gaaagtacct ctctcaaaca 1380 

cagccagagg ttccttgaga atgtaatccc tcactctgtt aactgacttg gcactctgaa 1440 

tatttgggtg aaacccattt atatcaaaga accttgagtc agatatcctt atcggcttct 1500 

ctggctgaag caatgcatgt aaatgcaaac ttccatcttt atgtgcctct cgggcacata 1560 

gaatatattt gggaatccaa cgaacgacga gctcccagat catctgacag gcgatttcag 1620 

gattttctgg acactttgga taggttagga acgtgttagc gttcctgtgt gagaactgac 1680 

ggttggatga ggaggaggcc atagccgacg acggaggttg aggctgaggg atggcagact 1740 

gggagctcca aactctatag tatacccgtg cgccttcgaa atccgccgct ccattgtctt 1800 

atagtggttg taaatgggcc ggaccgggcc ggcccagcag gaaaagaagg cgcgcactaa 1860 

tattaccgcg ccttcttttc ctgcgagggc ccggtaggga ccgagcgctt tgatttaaag 1920 

cctggttctg ctttgtatga tttatctaaa gcagcccaat ctaaagaaac cggtcccggg 1980 

cactataaat tgcctaacaa gtgcgattca ttcatggatc ctttaaactc gagtctagtc 2040 

ccgatctagt aacatagatg acaccgcgcg cgataattta tcctagtttg cgcgctatat 2100 

tttgttttct atcgcgtatt aaatgtataa ttgcgggact ctaatcataa aaacccatct 2160 

cataaataac gtcatgcatt acatgttaat tattacatgc ttaacgtaat tcaacagaaa 2220 

ttatatgata atcatcgaca gaccggcaac aggattcaat cttaagaaac tttattgcca 2280 

aatgtttgaa cgatcgggga aattcgctcg agttaattaa gcggccgcct caaaaaggat 2340 

cttcacctag atccttttaa attaaaaatg aagttttagc acgtgtcagt cctgctcctc 2400 

ggccacgaag tgcacgcagt tgccggccgg gtcgcgcagg gcgaactccc gcccccacgg 2460 

ctgctcgccg atctcggtca tggccggccc ggaggcgtcc cggaagttcg tggacacgac 2520 

ctccgaccac tcggcgtaca gctcgtccag gccgcgcacc cacacccagg ccagggtgtt 2580 

gtccggcacc acctggtcct ggaccgcgct gatgaacagg gtcacgtcgt cccggaccac 2640 

accggcgaag tcgtcctcca cgaagtcccg ggagaacccg agccggtcgg tccagaactc 2700 

gaccgctccg gcgacgtcgc gcgcggtgag caccggaacg gcactggtca acttggccat 2760 

ggtggccctc ctcacgtgct attattgaag catttatcag ggttattgtc tcatgagcgg 2820 

atacatattt gaatgtattt agaaaaataa acaaataggg gttccgcgca catttccccg 2880 

aaaagtgcca cctgtatgcg gtgtgaaata ccgcacagat gcgtaaggag aaaataccgc 2940 

atcaggcgaa attgtaaacg cggccgctta attaagtcga cgtcctctcc aaatgaaatg 3000 

aacttcctta tatagaggaa gggtcttgcg aaggatagtg ggattgtgcg tcatccctta 3060 

cgtcagtgga gatatcacat caatccactt gctttgaaga cgtggttgga acgtcttctt 3120 

tttccacgta gctcctcgtg ggtgggggtc catctttggg accactgtcg gcagaggcat 3180 
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cttgaacgat agcctttcct tatcgcaatg atggcatttg taggtgccac cttccttttc 3240 

tactgtcctt ttgatgaagt gacagatagc tgggcaatgg aatccgagga ggtttcccga 3300 

tattaccctt tgttgaaaag tctcaatagc cctttggtct tctgagactg tatctttgat 3360 

attcttggag tagacgagag agtgtcgtgc tccaccatgt tgacgaattc atgggcagac 342 0 

ccgtctgtac tttaagagtg ttggcaacca gtaatgaata aaaactcccg ttttattata 3480 

tttgatgaat gctgaaagct tacattaata tgtcgtgcga tggcacgaaa aaacacacgc 3540 

aaacaataca ggggggtagt cggcgggcgg ctaagggtgg tgctcggcgg gcagaacatc 3 60 0 

gaaaaatcaa gatctatatg aattacactt cctccgtagg aggaagcaca gggggagaat 3660 

accacttctc ccccggcgac ataatgtaaa tgacgcagtt tgcctcgaaa tactccagct 3720 

gccctggagt catttccttc atccaatctt catccgagtt ggcgaggatt attgtaggct 3780 

tagacttctt ctgcaccttt ttcttcttac catacttggg gtttacaatg aaatccctct 3840 

,gacagccaac taactgtttc caacaaggac agaatttaaa cggaatatca tctacgatgt 3900 

tgtagattgc gtcttcgttg tatgaagacc aatcaacatt attttgccag taattatgaa 3960 

cccctaggct tctggcccaa gtagattttc cggttcttgt tgggccgacg atgtagaggc 4020 

tctgctttct tgatctttca tctgatgact ggatacagaa tccatccatt ggaggtcaga 4080 

aattgcatcc tcgagggtat aacaggtagg ttgaaggagc atgtaagctt cgggactaac 4140 

ctggaagatg ttaggctgga gccaatcgtt gattgactca ttacaaagta aatcaggtga 4200 

ggagggtgga tgaggattgg tgaactcttc ctgaatctca ggaaaaagct tatttgcaga 4260 

gtattcaaaa tactgcaatt ttgtggacca atcaaagggg agctctttct ggatcatgga 4320 

gaggtactct tctttggagg tagcgtgtga aataatgtct cgcattattt catctttaga 4380 

aggctttttt tcctttacct ctgaatcaga ttttcctagg aagggggact tcctaggaat 4440 

gaaagtacct ctctcaaaca cagccagagg ttccttgaga atgtaatccc tcactctgtt 4500 

aactgacttg gcactctgaa tatttgggtg aaacccattt atatcaaaga accttgagtc 4560 

agatatcctt atcggcttct ctggctgaag caatgcatgt aaatgcaaac ttccatcttt 4620 

atgtgcctct cgggcacata gaatatattt gggaatccaa cgaacgacga gctcccagat 4680 

catctgacag gcgatttcag gattttctgg acactttgga taggttagga acgtgttagc 4740 

gttcctgtgt gagaactgac ggttggatga ggaggaggcc atagccgacg acggaggttg 4800 

aggctgaggg atggcagact gggagctcca aactctatag tatacccgtg cgccttcgaa 4860 

atccgccgct ccattgtctt atagtggttg taaatgggcc ggaccgggcc ggcccagcag 4920 

gaaaagaagg cgcgcactaa tattaccgcg ccttcttttc ctgcgagggc ccggggtagg 4 980 

gaccgagcgc tttgatttaa agcctggttc tgctttgtat gatttatcta aagcagccca 5040 

atctaaagaa accggtcccg ggcactataa attgcctaac aagtgcgatt cattcatgga 5100 

tcctttaaac tcgagtctag agggcccaat tcgccctata gtgagtcgta ttacaattca 5160 

ctggccgtcg ttttacaacg tcgtgactgg gaaaaccctg gcgttaccca acttaatcgc 5220 

cttgcagcac atcccccttt cgccagctgg cgtaatagcg aagaggcccg caccgatcgc 5280 

ccttcccaac agttgcgcag cctatacgta cggcagttta aggtttacac ctataaaaga 5340 

gagagccgtt atcgtctgtt tgtggatgta cagagtgata ttattgacac gccggggcga 5400 

cggatggtga tccccctggc cagtgcacgt ctgctgtcag ataaagtctc ccgtgaactt 5460 

tacccggtgg tgcatatcgg ggatgaaagc tggcgcatga tgaccaccga tatggccagt 5520 

gtgccggtct ccgttatcgg ggaagaagtg gctgatctca gccaccgcga aaatgacatc 5580 

aaaaacgcca ttaacctgat gttctgggga atataaatgt caggcctgaa tggcgaatgg 5640 

acgcgccctg tagcggcgca ttaagcgcgc gggtgtggtg gttacgcgca gcgtgaccgc 5700 

tacacttgcc agcgccctag cgcccgctcc tttcgctttc ttcccttcct ttctcgccac 5760 

gttcgccggc tttccccgtc aagctctaaa tcgggggctc cctttagggt tccgatttag 5820 

agctttacgg cacctcgacc gcaaaaaact tgatttgggt gatggttcac gtagtgggcc 5880 

atcgccctga tagacggttt ttcgcccttt gacgttggag. tccacgttct ttaatagtgg 5940 

actcttgttc caaactggaa caacactcaa ccctatcgcg gtctattctt ttgatttata 6000 

agggatgttg ccgatttcgg cctattggtt aaaaaatgag ctgatttaac aaaaatttta 6060 

acaaaattca gaagaactcg tcaagaaggc gatagaaggc gatgcgctgc gaatcgggag 6120 

cggcgatacc gtaaagcacg aggaagcggt cagcccattc gccgccaagc tcttcagcaa 6180 

tatcacgggt agccaacgct atgtcctgat agcggtccgc cacacccagc cggccacagt 6240 

cgatgaatcc agaaaagcgg ccattttcca ccatgatatt cggcaagcag gcatcgccat 6300 

gggtcacgac gagatcctcg ccgtcgggca tgctcgcctt gagcctggcg aacagttcgg 6360 

ctggcgcgag cccctgatgc tcttcgtcca gatcatcctg atcgacaaga ccggcttcca 6420 

tccgagtacg tgctcgctcg atgcgatgtt tcgcttggtg gtcgaatggg caggtagccg 6480 

gatcaagcgt atgcagccgc cgcattgcat cagccatgat ggatactttc tcggcaggag 6540 

caaggtgaga tgacaggaga tcctgccccg gcacttcgcc caatagcagc cagtcccttc 6600 
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ccgcttcagt 
atagccgcgc 
aaagaaccgg 
tctgttgtgc 
gcaatccatc 
atcccctgcg 
tcccaacctt 
ccgcccagtc 
ttgcgttttc 
tctgcggact 
tgaccaaaat 
tcaaaggatc 
aaccaccgct 
aggtaactgg 
taggccacca 
taccagtggc 
agttaccgga 
tggagcgaac 
cgcttcccga 
agcgcacgag 
gccacctctg 
aaaacgccag 
tgttctttcc 
ctgataccgc 
aag 



gacaacgtcg 
tgcctcgtct 
gcgcccctgc 
ccagtcatag 
ttgttcaatc 
ccatcagatc 
accagagggc 
tagctatcgc 
ccttgtccag 
ggctttctac 
cccttaacgt 
ttcttgagat 
accagcggtg 
cttcagcaga 
cttcaagaac 
tgctgccagt 
taaggcgcag 
gacctacacc 
agggagaaag 
ggagcttcca 
acttgagcgt 
caacgcggcc 
tgcgttatcc 
tcgccgcagc 



agcacagctg 
tgcagttcat 
gctgacagcc 
ccgaatagcc 
atgcgaaacg 
cttggcggcg 
gccccagctg 
catgtaagcc 
atagcccagt 
gtgaaaagga 
gagttttcgt 
cctttttttc 
gtttgtttgc 
gcgcagatac 
tctgtagcac 
ggcgataagt 
cggtcgggct 
gaactgagat 
gcggacaggt 
gggggaaacg 
cgatttttgt 
tttttacggt 
cctgattctg 
cgaacgaccg 



cgcaaggaac 
tcagggcacc 
ggaacacggc 
tctccaccca 
atcctcatcc 
agaaagccat 
gcaattccgg 
cactgcaagc 
agctgacatt 
tctaggtgaa 
tccactgagc 
tgcgcgtaat 
cggatcaaga 
caaatactgt 
cgcctacata 
cgtgtcttac 
gaacgggggg 
acctacagcg 
atccggtaag 
cctggtatct 
gatgctcgtc 
tcctgggctt 
tggataaccg 
agcgcagcga 



gcccgtcgtg 
ggacaggtcg 
ggcatcagag 
agcggccgga 
tgtctcttga 
ccagtttact 
ttcgcttgct 
tacctgcttt 
catccggggt 
gatccttttt 
gtcagacccc 
ctgctgcttg 
gctaccaact 
ccttctagtg 
cctcgctctg 
cgggttggac 
ttcgtgcaca 
tgagctatga 
cggcagggtc 
ttatagtcct 
aggggggcgg 
ttgctggcct 
tattaccgcc 
gtcagtgagc 



gccagccacg 
gtcttgacaa 
cagccgattg 
gaacctgcgt 
tcagatcttg 
ttgcagggct 
gtccataaaa 
ctctttgcgc 
cagcaccgtt 
gataatctca 
gtagaaaaga 
caaacaaaaa 
ctttttccga 
tagccgtagt 
ctaatcctgt 
tcaagacgat 
cagcccagct 
gaaagcgcca 
ggaacaggag 
gtcgggtttc 
agcctatgga 
tttgctcaca 
tttgagtgag 
gaggaagcgg 



6660 
6720 
6780 
6840 
6900 
6960 
7020 
7080 
7140 
7200 
7260 
7320 
7380 
7440 
7500 
7560 
7620 
7680 
7740 
7800 
7860 
7920 
7980 
8040 
8043 



<210> 14 
<211> 7404 
<212> DNA 
<213> Viral 



<400> 14 

agcgcccaat acgcaaaccg cctctccccg cgcgttggcc gattcattaa tgcagctggc 60 

acgacaggtt tcccgactgg aaagcgggca gtgagcgcaa cgcaattaat gtgagttagc 120 

tcactcatta ggcaccccag gctttacact ttatgcttcc ggctcgtatg ttgtgtggaa 180 

ttgtgagcgg ataacaattt cacacaggaa acagctatga ccatgattac gccaagctat 240 

ttaggtgaca ctatagaata ctcaagctat gcatcaagct tggtaccgag ctcggatcca 300 

ctagtaacgg ccgccagtgt gctggaattc atgggcagac ccgtctgtac tttaagagtg 360 

ttggcaacca gtaatgaata aaaactcccg ttttattata tttgatgaat gctgaaagct 420 

tacattaata tgtcgtgcga tggcacgaaa aaacacacgc aaacaataca ggggggtagt 480 

cggcgggcgg ctaagggtgg tgctcggcgg gcagaacatc gaaaaatcaa gatctatatg 540 

aattacactt cctccgtagg aggaagcaca gggggagaat accacttctc ccccggcgac 600 

ataatgtaaa tgacgcagtt tgcctcgaaa tactccagct gccctggagt catttccttc 660 

atccaatctt catccgagtt ggcgaggatt attgtaggct tagacttctt ctgcaccttt 720 

ttcttcttac catacttggg gtttacaatg aaatccctct gacagccaac taactgtttc 780 

caacaaggac agaatttaaa cggaatatca tctacgatgt tgtagattgc gtcttcgttg 840 

tatgaagacc aatcaacatt attttgccag taattatgaa cccctaggct tctggcccaa 900 

gtagattttc cggttcttgt tgggccgacg atgtagaggc tctgctttct tgatctttca 960 

tctgatgact ggatacagaa tccatccatt ggaggtcaga aattgcatcc tcgagggtat 1020 

aacaggtagg ttgaaggagc atgtaagctt cgggactaac ctggaagatg ttaggctgga 1080 

gccaatcgtt gattgactca ttacaaagta aatcaggtga ggagggtgga tgaggattgg 1140 

tgaactcttc ctgaatctca ggaaaaagct tatttgcaga gtattcaaaa tactgcaatt 1200 

ttgtggacca atcaaagggg agctctttct ggatcatgga gaggtactct tctttggagg 1260 

tagcgtgtga aataatgtct cgcattattt catctttaga aggctttttt tcctttacct 1320 

ctgaatcaga ttttcctagg aagggggact tcctaggaat gaaagtacct ctctcaaaca 13 80 



11/13 



WO 01/77350 



PCT/US01/11436 



cagccagagg ttccttgaga atgtaatccc tcactctgtt aactgacttg gcactctgaa 1440 

tatttgggtg aaacccattt atatcaaaga accttgagtc agatatcctt atcggcttct 1500 

ctggctgaag caatgcatgt aaatgcaaac ttccatcttt atgtgcctct cgggcacata 1560 

gaatatattt gggaatccaa cgaacgacga gctcccagat catctgacag gcgatttcag 1620 

gattttctgg acactttgga taggttagga acgtgttagc gttcctgtgt gagaactgac 1680 

ggttggatga ggaggaggcc atagccgacg acggaggttg aggctgaggg atggcagact 1740 

gggagctcca aactctatag tatacccgtg cgccttcgaa atccgccgct ccattgtctt 1800 

atagtggttg taaatgggcc ggaccgggcc ggcccagcag gaaaagaagg cgcgcactaa 1860 

tattaccgcg ccttcttttc ctgcgagggc ccggtaggga ccgagcgctt tgatttaaag 1920 

cctggttctg ctttgtatga tttatctaaa gcagcccaat ctaaagaaac cggtcccggg 1980 

cactataaat tgcctaacaa gtgcgattca ttcatggatc ctttaaactc gagtctagtc 2040 

ccgatctagt aacatagatg acaccgcgcg cgataattta tcctagtttg cgcgctatat 2100 

tttgttttct atcgcgtatt aaatgtataa ttgcgggact ctaatcataa aaacccatct 2160 

cataaataac gtcatgcatt acatgttaat tattacatgc ttaacgtaat tcaacagaaa 2220 

ttatatgata atcatcgaca gaccggcaac aggattcaat cttaagaaac tttattgcca 2280 

aatgtttgaa cgatcgggga aattcgctcg agttaattaa gcggccgctt aattaagtcg 2340 

acgtcctctc caaatgaaat gaacttcctt atatagagga agggtcttgc gaaggatagt 2400 

gggattgtgc gtcatccctt acgtcagtgg agatatcaca tcaatccact tgctttgaag 2460 

acgtggttgg aacgtcttct ttttccacgt agctcctcgt gggtgggggt ccatctttgg 2520 

gaccactgtc ggcagaggca tcttgaacga tagcctttcc ttatcgcaat gatggcattt 2580 

gtaggtgcca ccttcctttt ctactgtcct tttgatgaag tgacagatag ctgggcaatg 2640 

gaatccgagg aggtttcccg atattaccct ttgttgaaaa gtctcaatag ccctttggtc 2700 

ttctgagact gtatctttga tattcttgga gtagacgaga gagtgtcgtg ctccaccatg 2760 

ttgacgaatt catgggcaga cccgtctgta ctttaagagt gttggcaacc agtaatgaat 2820 

aaaaactccc gttttattat atttgatgaa tgctgaaagc ttacattaat atgtcgtgcg 2880 

atggcacgaa aaaacacacg caaacaatac aggggggtag tcggcgggcg gctaagggtg 2940 

gtgctcggcg ggcagaacat cgaaaaatca agatctatat gaattacact tcctccgtag 3000 

gaggaagcac agggggagaa taccacttct cccccggcga cataatgtaa atgacgcagt 3060 

ttgcctcgaa atactccagc tgccctggag tcatttcctt catccaatct tcatccgagt 3120 

tggcgaggat tattgtaggc ttagacttct tctgcacctt tttcttctta ccatacttgg 3180 

ggtttacaat gaaatccctc tgacagccaa ctaactgttt ccaacaagga cagaatttaa 3240 

acggaatatc atctacgatg ttgtagattg cgtcttcgtt gtatgaagac caatcaacat 3300 

tattttgcca gtaattatga acccctaggc ttctggccca agtagatttt ccggttcttg 3360 

ttgggccgac gatgtagagg ctctgctttc ttgatctttc atctgatgac tggatacaga 3420 

atccatccat tggaggtcag aaattgcatc ctcgagggta taacaggtag gttgaaggag 3480 

catgtaagct tcgggactaa cctggaagat gttaggctgg agccaatcgt tgattgactc 3540 

attacaaagt aaatcaggtg aggagggtgg atgaggattg gtgaactctt cctgaatctc 3600 

aggaaaaagc ttatttgcag agtattcaaa atactgcaat tttgtggacc aatcaaaggg 3660 

gagctctttc tggatcatgg agaggtactc ttctttggag gtagcgtgtg aaataatgtc 3720 

tcgcattatt tcatctttag aaggcttttt ttcctttacc tctgaatcag attttcctag 3780 

gaagggggac ttcctaggaa tgaaagtacc tctctcaaac acagccagag gttccttgag 3840 

aatgtaatcc ctcactctgt taactgactt ggcactctga atatttgggt gaaacccatt 3900 

tatatcaaag aaccttgagt cagatatcct tatcggcttc tctggctgaa gcaatgcatg 3960 

taaatgcaaa cttccatctt tatgtgcctc tcgggcacat agaatatatt tgggaatcca 4020 

acgaacgacg agctcccaga tcatctgaca ggcgatttca ggattttctg gacactttgg 4080 

ataggttagg aacgtgttag cgttcctgtg tgagaactga cggttggatg aggaggaggc 4140 

catagccgac gacggaggtt gaggctgagg gatggcagac tgggagctcc aaactctata 4200 

gtatacccgt gcgccttcga aatccgccgc tccattgtct tatagtggtt gtaaatgggc 4260 

cggaccgggc cggcccagca ggaaaagaag gcgcgcacta atattaccgc gccttctttt 4320 

cctgcgaggg cccggggtag ggaccgagcg ctttgattta aagcctggtt ctgctttgta 43 80 

tgatttatct aaagcagccc aatctaaaga aaccggtccc gggcactata aattgcctaa 4440 

caagtgcgat tcattcatgg atcctttaaa ctcgagtcta gagggcccaa ttcgccctat 4500 

agtgagtcgt attacaattc actggccgtc gttttacaac gtcgtgactg ggaaaaccct 4560 

ggcgttaccc aacttaatcg ccttgcagca catccccctt tcgccagctg gcgtaatagc 4620 

gaagaggccc gcaccgatcg cccttcccaa cagttgcgca gcctatacgt acggcagttt 4680 

aaggtttaca cctataaaag agagagccgt tatcgtctgt ttgtggatgt acagagtgat 4740 

attattgaca cgccggggcg acggatggtg atccccctgg ccagtgcacg tctgctgtca 4800 



12/13 



WO 01/77350 



PCT/US01/11436 



gataaagtct cccgtgaact ttacccggtg gtgcatatcg gggatgaaag ctggcgcatg 4860 

atgaccaccg atatggccag tgtgccggtc tccgttatcg gggaagaagt ggctgatctc 4920 

agccaccgcg aaaatgacat caaaaacgcc attaacctga tgttctgggg aatataaatg 4980 

tcaggcctga atggcgaatg gacgcgccct gtagcggcgc attaagcgcg cgggtgtggt 5040 

ggttacgcgc agcgtgaccg ctacacttgc cagcgcccta gcgcccgctc ctttcgcttt 5100 

cttcccttcc tttctcgcca cgttcgccgg ctttccccgt caagctctaa atcgggggct 5160 

ccctttaggg ttccgattta gagctttacg gcacctcgac cgcaaaaaac ttgatttggg 5220 

tgatggttca cgtagtgggc catcgccctg atagacggtt tttcgccctt tgacgttgga 5280 

gtccacgttc tttaatagtg gactcttgtt ccaaactgga acaacactca accctatcgc 5340 

ggtctattct tttgatttat aagggatgtt gccgatttcg gcctattggt taaaaaatga 5400 

gctgatttaa caaaaatttt aacaaaattc agaagaactc gtcaagaagg cgatagaagg 5460 

cgatgcgctg cgaatcggga gcggcgatac cgtaaagcac gaggaagcgg tcagcccatt 5520 

cgccgccaag ctcttcagca atatcacggg tagccaacgc tatgtcctga tagcggtccg 5580 

ccacacccag ccggccacag tcgatgaatc cagaaaagcg gccattttcc accatgatat 5640 

tcggcaagca ggcatcgcca tgggtcacga cgagatcctc gccgtcgggc atgctcgcct 5700 

tgagcctggc gaacagttcg gctggcgcga gcccctgatg ctcttcgtcc agatcatcct 5760 

gatcgacaag accggcttcc atccgagtac gtgctcgctc gatgcgatgt ttcgcttggt 5820 

ggtcgaatgg gcaggtagcc ggatcaagcg tatgcagccg ccgcattgca tcagccatga 5880 

tggatacttt ctcggcagga gcaaggtgag atgacaggag atcctgcccc ggcacttcgc 5940 

ccaatagcag ccagtccctt cccgcttcag tgacaacgtc gagcacagct gcgcaaggaa 6000 

cgcccgtcgt ggccagccac gatagccgcg ctgcctcgtc ttgcagttca ttcagggcac 6060 

cggacaggtc ggtcttgaca aaaagaaccg ggcgcccctg cgctgacagc cggaacacgg 6120 

cggcatcaga gcagccgatt gtctgttgtg cccagtcata gccgaatagc ctctccaccc 6180 

aagcggccgg agaacctgcg tgcaatccat cttgttcaat catgcgaaac gatcctcatc 6240 

ctgtctcttg atcagatctt gatcccctgc gccatcagat ccttggcggc gagaaagcca 6300 

tccagtttac tttgcagggc ttcccaacct taccagaggg cgccccagct ggcaattccg 6360 

gttcgcttgc tgtccataaa accgcccagt ctagctatcg ccatgtaagc ccactgcaag 6420 

ctacctgctt tctctttgcg cttgcgtttt cccttgtcca gatagcccag tagctgacat 6480 

tcatccgggg tcagcaccgt ttctgcggac tggctttcta cgtgaaaagg atctaggtga 6540 

agatcctttt tgataatctc atgaccaaaa tcccttaacg tgagttttcg ttccactgag 6600 

cgtcagaccc cgtagaaaag atcaaaggat cttcttgaga tccttttttt ctgcgcgtaa 6660 

tctgctgctt gcaaacaaaa aaaccaccgc taccagcggt ggtttgtttg ccggatcaag 6720 

agctaccaac tctttttccg aaggtaactg gcttcagcag agcgcagata ccaaatactg 6780 

tccttctagt gtagccgtag ttaggccacc acttcaagaa ctctgtagca ccgcctacat 6840 

acctcgctct gctaatcctg ttaccagtgg ctgctgccag tggcgataag tcgtgtctta 6900 

ccgggttgga ctcaagacga tagttaccgg ataaggcgca gcggtcgggc tgaacggggg 6960 

gttcgtgcac acagcccagc ttggagcgaa cgacctacac cgaactgaga tacctacagc 7020 

gtgagctatg agaaagcgcc acgcttcccg aagggagaaa ggcggacagg tatccggtaa 7080 

gcggcagggt cggaacagga gagcgcacga gggagcttcc agggggaaac gcctggtatc 7140 

tttatagtcc tgtcgggttt cgccacctct gacttgagcg tcgatttttg tgatgctcgt 7200 

caggggggcg gagcctatgg aaaaacgcca gcaacgcggc ctttttacgg ttcctgggct 7260 

tttgctggcc ttttgctcac atgttctttc ctgcgttatc ccctgattct gtggataacc 7320 

gtattaccgc ctttgagtga gctgataccg ctcgccgcag ccgaacgacc gagcgcagcg 7380 

agtcagtgag cgaggaagcg gaag 74 04 
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